Skip to content

Commit cfa0278

Browse files
committed
updates
1 parent 61dd0ae commit cfa0278

File tree

3 files changed

+6
-151
lines changed

3 files changed

+6
-151
lines changed

deploy-manage/distributed-architecture/discovery-cluster-formation.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@ applies_to:
55
stack:
66
---
77

8-
% Discovery and cluster formation content (7 pages): add introductory note to specify that the endpoints/settings are possibly for self-managed only, and review the content.
8+
::::{important}
9+
The information provided in this section is applicable to all deployment types. However, the configuration settings detailed here are only valid for self-managed {{es}} deployments. For {{ecloud}} and {{serverless}} deployments this seciton should only be used for general information.
10+
::::
911

1012
# Discovery and cluster formation [modules-discovery]
1113

deploy-manage/distributed-architecture/discovery-cluster-formation/discovery-hosts-providers.md

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -71,31 +71,14 @@ Host names are allowed instead of IP addresses and are resolved by DNS as descri
7171

7272
You can also add comments to this file. All comments must appear on their lines starting with `#` (i.e. comments cannot start in the middle of a line).
7373

74-
<<<<<<< HEAD
75-
76-
=======
77-
>>>>>>> a4b41272 (update)
7874
#### EC2 hosts provider [ec2-hosts-provider]
7975

8076
The [EC2 discovery plugin](elasticsearch://reference/elasticsearch-plugins/discovery-ec2.md) adds a hosts provider that uses the [AWS API](https://github.com/aws/aws-sdk-java) to find a list of seed nodes.
8177

82-
<<<<<<< HEAD
83-
84-
=======
85-
>>>>>>> a4b41272 (update)
8678
#### Azure Classic hosts provider [azure-classic-hosts-provider]
8779

8880
The [Azure Classic discovery plugin](elasticsearch://reference/elasticsearch-plugins/discovery-azure-classic.md) adds a hosts provider that uses the Azure Classic API find a list of seed nodes.
8981

90-
<<<<<<< HEAD
91-
9282
#### Google Compute Engine hosts provider [gce-hosts-provider]
9383

9484
The [GCE discovery plugin](elasticsearch://reference/elasticsearch-plugins/discovery-gce.md) adds a hosts provider that uses the GCE API find a list of seed nodes.
95-
96-
97-
=======
98-
#### Google Compute Engine hosts provider [gce-hosts-provider]
99-
100-
The [GCE discovery plugin](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch-plugins/discovery-gce.md) adds a hosts provider that uses the GCE API find a list of seed nodes.
101-
>>>>>>> a4b41272 (update)

deploy-manage/distributed-architecture/kibana-tasks-management.md

Lines changed: 3 additions & 133 deletions
Original file line numberDiff line numberDiff line change
@@ -45,148 +45,18 @@ For detailed troubleshooting guidance, see [Troubleshooting](../../troubleshoot/
4545

4646
::::
4747

48-
<<<<<<< HEAD
49-
50-
51-
=======
52-
>>>>>>> f70338d9 (updates)
53-
## Deployment considerations [_deployment_considerations]
48+
## Scaling [task-manager-scaling-overview]
5449

5550
{{es}} and {{kib}} instances use the system clock to determine the current time. To ensure schedules are triggered when expected, synchronize the clocks of all nodes in the cluster using a time service such as [Network Time Protocol](http://www.ntp.org/).
5651

57-
<<<<<<< HEAD
58-
59-
=======
60-
>>>>>>> f70338d9 (updates)
61-
## Scaling guidance [task-manager-scaling-guidance]
52+
By default, {{kib}} polls for tasks at a rate of 10 tasks every 3 seconds. This means that you can expect a single {{kib}} instance to support up to 200 *tasks per minute* (`200/tpm`).
6253

6354
How you deploy {{kib}} largely depends on your use case. Predicting the throughout a deployment might require to support Task Management is difficult because features can schedule an unpredictable number of tasks at a variety of scheduled cadences.
6455

6556
However, there is a relatively straight forward method you can follow to produce a rough estimate based on your expected usage.
6657

67-
<<<<<<< HEAD
68-
69-
### Default scale [task-manager-default-scaling]
70-
=======
71-
### Default scale [task-manager-default-scaling]
72-
>>>>>>> f70338d9 (updates)
73-
7458
By default, {{kib}} polls for tasks at a rate of 10 tasks every 3 seconds. This means that you can expect a single {{kib}} instance to support up to 200 *tasks per minute* (`200/tpm`).
7559

7660
In practice, a {{kib}} instance will only achieve the upper bound of `200/tpm` if the duration of task execution is below the polling rate of 3 seconds. For the most part, the duration of tasks is below that threshold, but it can vary greatly as {{es}} and {{kib}} usage grow and task complexity increases (such as alerts executing heavy queries across large datasets).
7761

78-
By [estimating a rough throughput requirement](#task-manager-rough-throughput-estimation), you can estimate the number of {{kib}} instances required to reliably execute tasks in a timely manner. An appropriate number of {{kib}} instances can be estimated to match the required scale.
79-
80-
For details on monitoring the health of {{kib}} Task Manager, follow the guidance in [Health monitoring](../monitor/kibana-task-manager-health-monitoring.md).
81-
82-
<<<<<<< HEAD
83-
84-
### Scaling horizontally [task-manager-scaling-horizontally]
85-
86-
At times, the sustainable approach might be to expand the throughput of your cluster by provisioning additional {{kib}} instances. By default, each additional {{kib}} instance will add an additional 10 tasks that your cluster can run concurrently, but you can also scale each {{kib}} instance vertically, if your diagnosis indicates that they can handle the additional workload.
87-
88-
89-
### Scaling vertically [task-manager-scaling-vertically]
90-
=======
91-
### Scaling horizontally [task-manager-scaling-horizontally]
92-
93-
At times, the sustainable approach might be to expand the throughput of your cluster by provisioning additional {{kib}} instances. By default, each additional {{kib}} instance will add an additional 10 tasks that your cluster can run concurrently, but you can also scale each {{kib}} instance vertically, if your diagnosis indicates that they can handle the additional workload.
94-
95-
### Scaling vertically [task-manager-scaling-vertically]
96-
>>>>>>> f70338d9 (updates)
97-
98-
Other times it, might be preferable to increase the throughput of individual {{kib}} instances.
99-
100-
Tweak the capacity with the [`xpack.task_manager.capacity`](kibana://reference/configuration-reference/task-manager-settings.md#task-manager-settings) setting, which enables each {{kib}} instance to pull a higher number of tasks per interval. This setting can impact the performance of each instance as the workload will be higher.
101-
102-
Tweak the poll interval with the [`xpack.task_manager.poll_interval`](kibana://reference/configuration-reference/task-manager-settings.md#task-manager-settings) setting, which enables each {{kib}} instance to pull scheduled tasks at a higher rate. This setting can impact the performance of the {{es}} cluster as the workload will be higher.
103-
104-
<<<<<<< HEAD
105-
106-
### Choosing a scaling strategy [task-manager-choosing-scaling-strategy]
107-
=======
108-
### Choosing a scaling strategy [task-manager-choosing-scaling-strategy]
109-
>>>>>>> f70338d9 (updates)
110-
111-
Each scaling strategy comes with its own considerations, and the appropriate strategy largely depends on your use case.
112-
113-
Scaling {{kib}} instances vertically causes higher resource usage in each {{kib}} instance, as it will perform more concurrent work. Scaling {{kib}} instances horizontally requires a higher degree of coordination, which can impact overall performance.
114-
115-
A recommended strategy is to follow these steps:
116-
117-
1. Produce a [rough throughput estimate](#task-manager-rough-throughput-estimation) as a guide to provisioning as many {{kib}} instances as needed. Include any growth in tasks that you predict experiencing in the near future, and a buffer to better address ad-hoc tasks.
118-
2. After provisioning a deployment, assess whether the provisioned {{kib}} instances achieve the required throughput by evaluating the [Health monitoring](../monitor/kibana-task-manager-health-monitoring.md) as described in [Insufficient throughput to handle the scheduled workload](../../troubleshoot/kibana/task-manager.md#task-manager-theory-insufficient-throughput).
119-
3. If the throughput is insufficient, and {{kib}} instances exhibit low resource usage, incrementally scale vertically while [monitoring](../monitor/monitoring-data/kibana-page.md) the impact of these changes.
120-
4. If the throughput is insufficient, and {{kib}} instances are exhibiting high resource usage, incrementally scale horizontally by provisioning new {{kib}} instances and reassess.
121-
122-
Task Manager, like the rest of the Elastic Stack, is designed to scale horizontally. Take advantage of this ability to ensure mission critical services, such as Alerting, Actions, and Reporting, always have the capacity they need.
123-
124-
Scaling horizontally requires a higher degree of coordination between {{kib}} instances. One way Task Manager coordinates with other instances is by delaying its polling schedule to avoid conflicts with other instances. By using [health monitoring](../monitor/kibana-task-manager-health-monitoring.md) to evaluate the [date of the `last_polling_delay`](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-runtime) across a deployment, you can estimate the frequency at which Task Manager resets its delay mechanism. A higher frequency suggests {{kib}} instances conflict at a high rate, which you can address by scaling vertically rather than horizontally, reducing the required coordination.
125-
126-
<<<<<<< HEAD
127-
128-
### Rough throughput estimation [task-manager-rough-throughput-estimation]
129-
=======
130-
### Rough throughput estimation [task-manager-rough-throughput-estimation]
131-
>>>>>>> f70338d9 (updates)
132-
133-
Predicting the required throughput a deployment might need to support Task Management is difficult, as features can schedule an unpredictable number of tasks at a variety of scheduled cadences. However, a rough lower bound can be estimated, which is then used as a guide.
134-
135-
Throughput is best thought of as a measurements in tasks per minute.
136-
137-
A default {{kib}} instance can support up to `200/tpm`.
138-
139-
#### Automatic estimation [_automatic_estimation]
140-
141-
<<<<<<< HEAD
142-
#### Automatic estimation [_automatic_estimation]
143-
144-
=======
145-
>>>>>>> f70338d9 (updates)
146-
::::{warning}
147-
This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
148-
::::
149-
150-
As demonstrated in [Evaluate your capacity estimation](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-capacity-estimation), the Task Manager [health monitoring](../monitor/kibana-task-manager-health-monitoring.md) performs these estimations automatically.
151-
152-
These estimates are based on historical data and should not be used as predictions, but can be used as a rough guide when scaling the system.
153-
154-
We recommend provisioning enough {{kib}} instances to ensure a buffer between the observed maximum throughput (as estimated under `observed.max_throughput_per_minute`) and the average required throughput (as estimated under `observed.avg_required_throughput_per_minute`). Otherwise there might be insufficient capacity to handle spikes of ad-hoc tasks. How much of a buffer is needed largely depends on your use case, but keep in mind that estimated throughput takes into account recent spikes and, as long as they are representative of your system’s behaviour, shouldn’t require much of a buffer.
155-
156-
We recommend provisioning at least as many {{kib}} instances as proposed by `proposed.provisioned_kibana`, but keep in mind that this number is based on the estimated required throughput, which is based on average historical performance, and cannot accurately predict future requirements.
157-
158-
::::{warning}
159-
Automatic capacity estimation is performed by each {{kib}} instance independently. This estimation is performed by observing the task throughput in that instance, the number of {{kib}} instances executing tasks at that moment in time, and the recurring workload in {{es}}.
160-
161-
If a {{kib}} instance is idle at the moment of capacity estimation, the number of active {{kib}} instances might be miscounted and the available throughput miscalculated.
162-
163-
When evaluating the proposed {{kib}} instance number under `proposed.provisioned_kibana`, we highly recommend verifying that the `observed.observed_kibana_instances` matches the number of provisioned {{kib}} instances.
164-
165-
::::
166-
167-
<<<<<<< HEAD
168-
169-
170-
=======
171-
>>>>>>> f70338d9 (updates)
172-
#### Manual estimation [_manual_estimation]
173-
174-
By [evaluating the workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload), you can make a rough estimate as to the required throughput as a *tasks per minute* measurement.
175-
176-
For example, suppose your current workload reveals a required throughput of `440/tpm`. You can address this scale by provisioning 3 {{kib}} instances, with an upper throughput of `600/tpm`. This scale would provide approximately 25% additional capacity to handle ad-hoc non-recurring tasks and potential growth in recurring tasks.
177-
178-
Given a deployment of 100 recurring tasks, estimating the required throughput depends on the scheduled cadence. Suppose you expect to run 50 tasks at a cadence of `10s`, the other 50 tasks at `20m`. In addition, you expect a couple dozen non-recurring tasks every minute.
179-
180-
A non-recurring task requires a single execution, which means that a single {{kib}} instance could execute all 100 tasks in less than a minute, using only half of its capacity. As these tasks are only executed once, the {{kib}} instance will sit idle once all tasks are executed. For that reason, don’t include non-recurring tasks in your *tasks per minute* calculation. Instead, include a buffer in the final *lower bound* to incur the cost of ad-hoc non-recurring tasks.
181-
182-
A recurring task requires as many executions as its cadence can fit in a minute. A recurring task with a `10s` schedule will require `6/tpm`, as it will execute 6 times per minute. A recurring task with a `20m` schedule only executes 3 times per hour and only requires a throughput of `0.05/tpm`, a number so small it that is difficult to take it into account.
183-
184-
For this reason, we recommend grouping tasks by *tasks per minute* and *tasks per hour*, as demonstrated in [Evaluate your workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload), averaging the *per hour* measurement across all minutes.
185-
186-
It is highly recommended that you maintain at least 20% additional capacity, beyond your expected workload, as spikes in ad-hoc tasks is possible at times of high activity (such as a spike in actions in response to an active alert).
187-
188-
Given the predicted workload, you can estimate a lower bound throughput of `340/tpm` (`6/tpm` * 50 + `3/tph` * 50 + 20% buffer). As a default, a {{kib}} instance provides a throughput of `200/tpm`. A good starting point for your deployment is to provision 2 {{kib}} instances. You could then monitor their performance and reassess as the required throughput becomes clearer.
189-
190-
Although this is a *rough* estimate, the *tasks per minute* provides the lower bound needed to execute tasks on time.
191-
192-
Once you estimate *tasks per minute* , add a buffer for non-recurring tasks. How much of a buffer is required largely depends on your use case. Ensure enough of a buffer is provisioned by [evaluating your workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload) as it grows and tracking the ratio of recurring to non-recurring tasks by [evaluating your runtime](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-runtime).
62+
For more information on scaling, see [Kibana task manager scaling considerations](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md).

0 commit comments

Comments
 (0)