Skip to content

Commit a78fa6d

Browse files
committed
updates
1 parent fb66746 commit a78fa6d

File tree

7 files changed

+188
-96
lines changed

7 files changed

+188
-96
lines changed

deploy-manage/distributed-architecture/clusters-nodes-shards.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ mapped_pages:
55

66
# Clusters, nodes, and shards [nodes-shards]
77

8-
::::{note}
8+
::::{note}
99
Nodes and shards are what make {{es}} distributed and scalable. These concepts aren’t essential if you’re just getting started. How you [deploy {{es}}](../../get-started/deployment-options.md) in production determines what you need to know:
1010

1111
* **Self-managed {{es}}**: You are responsible for setting up and managing nodes, clusters, shards, and replicas. This includes managing the underlying infrastructure, scaling, and ensuring high availability through failover and backup strategies.
@@ -21,17 +21,15 @@ Elastic is able to distribute your data across nodes by subdividing an index int
2121

2222
There are two types of shards: *primaries* and *replicas*. Each document in an index belongs to one primary shard. A replica shard is a copy of a primary shard. Replicas maintain redundant copies of your data across the nodes in your cluster. This protects against hardware failure and increases capacity to serve read requests like searching or retrieving a document.
2323

24-
::::{tip}
24+
::::{tip}
2525
The number of primary shards in an index is fixed at the time that an index is created, but the number of replica shards can be changed at any time, without interrupting indexing or query operations.
2626

2727
::::
2828

29-
3029
Shard copies in your cluster are automatically balanced across nodes to provide scale and high availability. All nodes are aware of all the other nodes in the cluster and can forward client requests to the appropriate node. This allows {{es}} to distribute indexing and query load across the cluster.
3130

3231
If you’re exploring {{es}} for the first time or working in a development environment, then you can use a cluster with a single node and create indices with only one shard. However, in a production environment, you should build a cluster with multiple nodes and indices with multiple shards to increase performance and resilience.
3332

3433
* To learn about optimizing the number and size of shards in your cluster, refer to [Size your shards](../production-guidance/optimize-performance/size-shards.md).
3534
* To learn about how read and write operations are replicated across shards and shard copies, refer to [Reading and writing documents](reading-and-writing-documents.md).
36-
* To adjust how shards are allocated and balanced across nodes, refer to [Shard allocation, relocation, and recovery](shard-allocation-relocation-recovery.md).
37-
35+
* To adjust how shards are allocated and balanced across nodes, refer to [Shard allocation, relocation, and recovery](shard-allocation-relocation-recovery.md).

deploy-manage/distributed-architecture/discovery-cluster-formation.md

Lines changed: 8 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -9,33 +9,18 @@ The discovery and cluster formation processes are responsible for discovering no
99

1010
The following processes and settings are part of discovery and cluster formation:
1111

12-
[Discovery](discovery-cluster-formation/discovery-hosts-providers.md)
13-
: Discovery is the process where nodes find each other when the master is unknown, such as when a node has just started up or when the previous master has failed.
14-
15-
[Quorum-based decision making](discovery-cluster-formation/modules-discovery-quorums.md)
16-
: How {{es}} uses a quorum-based voting mechanism to make decisions even if some nodes are unavailable.
17-
18-
[Voting configurations](discovery-cluster-formation/modules-discovery-voting.md)
19-
: How {{es}} automatically updates voting configurations as nodes leave and join a cluster.
20-
21-
[Bootstrapping a cluster](discovery-cluster-formation/modules-discovery-bootstrap-cluster.md)
22-
: Bootstrapping a cluster is required when an {{es}} cluster starts up for the very first time. In [development mode](../deploy/self-managed/bootstrap-checks.md#dev-vs-prod-mode), with no discovery settings configured, this is automatically performed by the nodes themselves. As this auto-bootstrapping is [inherently unsafe](discovery-cluster-formation/modules-discovery-quorums.md), running a node in [production mode](../deploy/self-managed/bootstrap-checks.md#dev-vs-prod-mode) requires bootstrapping to be [explicitly configured](discovery-cluster-formation/modules-discovery-bootstrap-cluster.md).
23-
24-
[Adding and removing master-eligible nodes](../maintenance/add-and-remove-elasticsearch-nodes.md)
25-
: It is recommended to have a small and fixed number of master-eligible nodes in a cluster, and to scale the cluster up and down by adding and removing master-ineligible nodes only. However there are situations in which it may be desirable to add or remove some master-eligible nodes to or from a cluster. This section describes the process for adding or removing master-eligible nodes, including the extra steps that need to be performed when removing more than half of the master-eligible nodes at the same time.
26-
27-
[Publishing the cluster state](discovery-cluster-formation/cluster-state-overview.md#cluster-state-publishing)
28-
: Cluster state publishing is the process by which the elected master node updates the cluster state on all the other nodes in the cluster.
29-
30-
[Cluster fault detection](discovery-cluster-formation/cluster-fault-detection.md)
31-
: {{es}} performs health checks to detect and remove faulty nodes.
32-
33-
[Settings](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/configuration-reference/discovery-cluster-formation-settings.md)
34-
: There are settings that enable users to influence the discovery, cluster formation, master election and fault detection processes.
12+
[Discovery](discovery-cluster-formation/discovery-hosts-providers.md): Discovery is the process where nodes find each other when the master is unknown, such as when a node has just started up or when the previous master has failed.
3513

14+
[Quorum-based decision making](discovery-cluster-formation/modules-discovery-quorums.md): How {{es}} uses a quorum-based voting mechanism to make decisions even if some nodes are unavailable.
3615

16+
[Voting configurations](discovery-cluster-formation/modules-discovery-voting.md): How {{es}} automatically updates voting configurations as nodes leave and join a cluster.
3717

18+
[Bootstrapping a cluster](discovery-cluster-formation/modules-discovery-bootstrap-cluster.md): Bootstrapping a cluster is required when an {{es}} cluster starts up for the very first time. In [development mode](../deploy/self-managed/bootstrap-checks.md#dev-vs-prod-mode), with no discovery settings configured, this is automatically performed by the nodes themselves. As this auto-bootstrapping is [inherently unsafe](discovery-cluster-formation/modules-discovery-quorums.md), running a node in [production mode](../deploy/self-managed/bootstrap-checks.md#dev-vs-prod-mode) requires bootstrapping to be [explicitly configured](discovery-cluster-formation/modules-discovery-bootstrap-cluster.md).
3819

20+
[Adding and removing master-eligible nodes](../maintenance/add-and-remove-elasticsearch-nodes.md): It is recommended to have a small and fixed number of master-eligible nodes in a cluster, and to scale the cluster up and down by adding and removing master-ineligible nodes only. However there are situations in which it may be desirable to add or remove some master-eligible nodes to or from a cluster. This section describes the process for adding or removing master-eligible nodes, including the extra steps that need to be performed when removing more than half of the master-eligible nodes at the same time.
3921

22+
[Publishing the cluster state](discovery-cluster-formation/cluster-state-overview.md#cluster-state-publishing): Cluster state publishing is the process by which the elected master node updates the cluster state on all the other nodes in the cluster.
4023

24+
[Cluster fault detection](discovery-cluster-formation/cluster-fault-detection.md): {{es}} performs health checks to detect and remove faulty nodes.
4125

26+
[Settings](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/configuration-reference/discovery-cluster-formation-settings.md): There are settings that enable users to influence the discovery, cluster formation, master election and fault detection processes.

deploy-manage/distributed-architecture/kibana-tasks-management.md

Lines changed: 11 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -20,21 +20,17 @@ If you lose this index, all scheduled alerts and actions are lost.
2020

2121
::::
2222

23-
24-
25-
## Running background tasks [task-manager-background-tasks]
23+
## Running background tasks [task-manager-background-tasks]
2624

2725
{{kib}} background tasks are managed as follows:
2826

2927
* An {{es}} task index is polled for overdue tasks at 3-second intervals. You can change this interval using the [`xpack.task_manager.poll_interval`](asciidocalypse://docs/kibana/docs/reference/configuration-reference/task-manager-settings.md#task-manager-settings) setting.
3028
* Tasks are claimed by updating them in the {{es}} index, using optimistic concurrency control to prevent conflicts. Each {{kib}} instance can run a maximum of 10 concurrent tasks, so a maximum of 10 tasks are claimed each interval.
3129
* Tasks are run on the {{kib}} server.
3230
* Task Manager ensures that tasks:
33-
34-
* Are only executed once
35-
* Are retried when they fail (if configured to do so)
36-
* Are rescheduled to run again at a future point in time (if configured to do so)
37-
31+
* Are only executed once
32+
* Are retried when they fail (if configured to do so)
33+
* Are rescheduled to run again at a future point in time (if configured to do so)
3834

3935
::::{important}
4036
It is possible for tasks to run late or at an inconsistent schedule.
@@ -49,20 +45,16 @@ For detailed troubleshooting guidance, see [Troubleshooting](../../troubleshoot/
4945

5046
::::
5147

52-
53-
54-
## Deployment considerations [_deployment_considerations]
48+
## Deployment considerations [_deployment_considerations]
5549

5650
{{es}} and {{kib}} instances use the system clock to determine the current time. To ensure schedules are triggered when expected, synchronize the clocks of all nodes in the cluster using a time service such as [Network Time Protocol](http://www.ntp.org/).
5751

58-
59-
## Scaling guidance [task-manager-scaling-guidance]
52+
## Scaling guidance [task-manager-scaling-guidance]
6053

6154
How you deploy {{kib}} largely depends on your use case. Predicting the throughout a deployment might require to support Task Management is difficult because features can schedule an unpredictable number of tasks at a variety of scheduled cadences.
6255

6356
However, there is a relatively straight forward method you can follow to produce a rough estimate based on your expected usage.
6457

65-
6658
### Default scale [task-manager-default-scaling]
6759

6860
By default, {{kib}} polls for tasks at a rate of 10 tasks every 3 seconds. This means that you can expect a single {{kib}} instance to support up to 200 *tasks per minute* (`200/tpm`).
@@ -73,12 +65,10 @@ By [estimating a rough throughput requirement](#task-manager-rough-throughput-es
7365

7466
For details on monitoring the health of {{kib}} Task Manager, follow the guidance in [Health monitoring](../monitor/kibana-task-manager-health-monitoring.md).
7567

76-
7768
### Scaling horizontally [task-manager-scaling-horizontally]
7869

7970
At times, the sustainable approach might be to expand the throughput of your cluster by provisioning additional {{kib}} instances. By default, each additional {{kib}} instance will add an additional 10 tasks that your cluster can run concurrently, but you can also scale each {{kib}} instance vertically, if your diagnosis indicates that they can handle the additional workload.
8071

81-
8272
### Scaling vertically [task-manager-scaling-vertically]
8373

8474
Other times it, might be preferable to increase the throughput of individual {{kib}} instances.
@@ -87,7 +77,6 @@ Tweak the capacity with the [`xpack.task_manager.capacity`](asciidocalypse://doc
8777

8878
Tweak the poll interval with the [`xpack.task_manager.poll_interval`](asciidocalypse://docs/kibana/docs/reference/configuration-reference/task-manager-settings.md#task-manager-settings) setting, which enables each {{kib}} instance to pull scheduled tasks at a higher rate. This setting can impact the performance of the {{es}} cluster as the workload will be higher.
8979

90-
9180
### Choosing a scaling strategy [task-manager-choosing-scaling-strategy]
9281

9382
Each scaling strategy comes with its own considerations, and the appropriate strategy largely depends on your use case.
@@ -105,7 +94,6 @@ Task Manager, like the rest of the Elastic Stack, is designed to scale horizonta
10594

10695
Scaling horizontally requires a higher degree of coordination between {{kib}} instances. One way Task Manager coordinates with other instances is by delaying its polling schedule to avoid conflicts with other instances. By using [health monitoring](../monitor/kibana-task-manager-health-monitoring.md) to evaluate the [date of the `last_polling_delay`](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-runtime) across a deployment, you can estimate the frequency at which Task Manager resets its delay mechanism. A higher frequency suggests {{kib}} instances conflict at a high rate, which you can address by scaling vertically rather than horizontally, reducing the required coordination.
10796

108-
10997
### Rough throughput estimation [task-manager-rough-throughput-estimation]
11098

11199
Predicting the required throughput a deployment might need to support Task Management is difficult, as features can schedule an unpredictable number of tasks at a variety of scheduled cadences. However, a rough lower bound can be estimated, which is then used as a guide.
@@ -114,14 +102,12 @@ Throughput is best thought of as a measurements in tasks per minute.
114102

115103
A default {{kib}} instance can support up to `200/tpm`.
116104

105+
#### Automatic estimation [_automatic_estimation]
117106

118-
#### Automatic estimation [_automatic_estimation]
119-
120-
::::{warning}
107+
::::{warning}
121108
This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
122109
::::
123110

124-
125111
As demonstrated in [Evaluate your capacity estimation](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-capacity-estimation), the Task Manager [health monitoring](../monitor/kibana-task-manager-health-monitoring.md) performs these estimations automatically.
126112

127113
These estimates are based on historical data and should not be used as predictions, but can be used as a rough guide when scaling the system.
@@ -130,7 +116,7 @@ We recommend provisioning enough {{kib}} instances to ensure a buffer between th
130116

131117
We recommend provisioning at least as many {{kib}} instances as proposed by `proposed.provisioned_kibana`, but keep in mind that this number is based on the estimated required throughput, which is based on average historical performance, and cannot accurately predict future requirements.
132118

133-
::::{warning}
119+
::::{warning}
134120
Automatic capacity estimation is performed by each {{kib}} instance independently. This estimation is performed by observing the task throughput in that instance, the number of {{kib}} instances executing tasks at that moment in time, and the recurring workload in {{es}}.
135121

136122
If a {{kib}} instance is idle at the moment of capacity estimation, the number of active {{kib}} instances might be miscounted and the available throughput miscalculated.
@@ -139,9 +125,7 @@ When evaluating the proposed {{kib}} instance number under `proposed.provisioned
139125

140126
::::
141127

142-
143-
144-
#### Manual estimation [_manual_estimation]
128+
#### Manual estimation [_manual_estimation]
145129

146130
By [evaluating the workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload), you can make a rough estimate as to the required throughput as a *tasks per minute* measurement.
147131

@@ -161,7 +145,4 @@ Given the predicted workload, you can estimate a lower bound throughput of `340/
161145

162146
Although this is a *rough* estimate, the *tasks per minute* provides the lower bound needed to execute tasks on time.
163147

164-
Once you estimate *tasks per minute* , add a buffer for non-recurring tasks. How much of a buffer is required largely depends on your use case. Ensure enough of a buffer is provisioned by [evaluating your workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload) as it grows and tracking the ratio of recurring to non-recurring tasks by [evaluating your runtime](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-runtime).
165-
166-
167-
148+
Once you estimate *tasks per minute* , add a buffer for non-recurring tasks. How much of a buffer is required largely depends on your use case. Ensure enough of a buffer is provisioned by [evaluating your workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload) as it grows and tracking the ratio of recurring to non-recurring tasks by [evaluating your runtime](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-runtime).

0 commit comments

Comments
 (0)