Skip to content

Commit 7cea06b

Browse files
authored
Merge branch 'main' into apm-9-upgrade
2 parents 5c384d2 + 17707b3 commit 7cea06b

File tree

3,396 files changed

+1791
-4324
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

3,396 files changed

+1791
-4324
lines changed
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
name: "Docs issue"
2+
description: Report documentation issues such as inaccuracies, broken links, typos, or missing information.
3+
title: "[Issue]: "
4+
labels: ["triage"]
5+
body:
6+
- type: markdown
7+
attributes:
8+
value: |
9+
Hi 👋. Thanks for taking the time to fill out this issue report!
10+
This form will create an issue that Elastic's docs team will triage and prioritize.
11+
You can also open a PR instead.
12+
- type: dropdown
13+
attributes:
14+
label: Type of issue
15+
description: What type of issue are you reporting?
16+
multiple: false
17+
options:
18+
- Inaccurate
19+
- Missing information
20+
- I can't find what I'm looking for
21+
- Other
22+
- type: input
23+
id: link
24+
attributes:
25+
label: What documentation page is affected
26+
description: Include a link to the page where you're seeing the problem.
27+
validations:
28+
required: true
29+
- type: textarea
30+
id: related
31+
attributes:
32+
label: What happened?
33+
description: Describe the issue you're experiencing. Screenshots are valuable too!
34+
validations:
35+
required: true
36+
- type: textarea
37+
id: moreinfo
38+
attributes:
39+
label: Additional info
40+
description: Anything else we should know?
41+
validations:
42+
required: false
Lines changed: 28 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,74 +1,60 @@
11
---
2+
navigation_title: High availability
23
applies_to:
34
deployment:
45
ece: all
56
mapped_pages:
67
- https://www.elastic.co/guide/en/cloud-enterprise/current/ece-ha.html
78
---
89

9-
# High availability [ece-ha]
10+
# High availability in ECE
1011

11-
Ensuring high availability in {{ece}} (ECE) requires careful planning and implementation across multiple areas, including availability zones, master nodes, replica shards, snapshot backups, and Zookeeper nodes.
12+
Ensuring high availability (HA) in {{ece}} (ECE) requires careful planning and implementation across multiple areas, including availability zones, master nodes, replica shards, snapshot backups, and Zookeeper nodes.
1213

13-
This section describes key considerations and best practices to prevent downtime and data loss at both the ECE platform level and within orchestrated deployments.
14-
15-
## Availability zones [ece-ece-ha-1-az]
16-
17-
Fault tolerance for ECE is based around the concept of *availability zones*.
18-
19-
An availability zone contains resources available to an ECE installation that are isolated from other availability zones to safeguard against potential failure.
20-
21-
Planning for a fault-tolerant installation with multiple availability zones means avoiding any single point of failure that could bring down ECE.
22-
23-
The main difference between ECE installations that include two or three availability zones is that three availability zones enable ECE to create clusters with a *tiebreaker*. If you have only two availability zones in total in your installation, no tiebreaker is created.
14+
::::{note}
15+
This section focuses on ensuring high availability at the ECE platform level. For deployment-level considerations, including resiliency, scaling, and performance optimizations for running {{es}} and {{kib}}, refer to the general [production guidance](/deploy-manage/production-guidance.md).
16+
::::
2417

25-
We recommend that for each deployment you use at least two availability zones for production and three for mission-critical systems. Using more than three availability zones for a deployment is not required nor supported. Availability zones are intended for high availability, not scalability.
18+
To maintain a minimum HA, you should deploy at least two ECE hosts for each role—**allocator, constructor, and proxy**—and at least three hosts for the **director** role, which runs ZooKeeper and requires quorum to operate reliably.
2619

27-
::::{warning}
28-
{{es}} clusters that are set up to use only one availability zone are not [highly available](/deploy-manage/production-guidance/availability-and-resilience.md) and are at risk of data loss. To safeguard against data loss, you must use at least two {{ece}} availability zones.
29-
::::
20+
In addition, to improve resiliency at the availability zone level, it’s recommended to deploy ECE across three availability zones, with at least two allocators per zone and spare capacity to accommodate instance failover and workload redistribution in case of failures.
3021

31-
::::{warning}
32-
Increasing the number of zones should not be used to add more resources. The concept of zones is meant for High Availability (2 zones) and Fault Tolerance (3 zones), but neither will work if the cluster relies on the resources from those zones to be operational. The recommendation is to scale up the resources within a single zone until the cluster can take the full load (add some buffer to be prepared for a peak of requests), then scale out by adding additional zones depending on your requirements: 2 zones for High Availability, 3 zones for Fault Tolerance.
33-
::::
22+
All Elastic-documented architectures recommend using three availability zones with ECE roles distributed across all zones. Refer to [deployment scenarios](./identify-deployment-scenario.md) for examples of small, medium, and large installations.
3423

24+
Regardless of the resiliency level at the platform level, it’s important to also [configure your deployments for high availability](/deploy-manage/production-guidance/availability-and-resilience/resilience-in-ech.md).
3525

36-
## Master nodes [ece-ece-ha-2-master-nodes]
26+
## Availability zones [ece-ece-ha-1-az]
3727

38-
Tiebreakers are used in distributed clusters to avoid cases of [split brain](https://en.wikipedia.org/wiki/Split-brain_(computing)), where an {{es}} cluster splits into multiple, autonomous parts that continue to handle requests independently of each other, at the risk of affecting cluster consistency and data loss. A split-brain scenario is avoided by making sure that a minimum number of [master-eligible nodes](elasticsearch://reference/elasticsearch/configuration-reference/node-settings.md#master-node) must be present in order for any part of the cluster to elect a master node and accept user requests. To prevent multiple parts of a cluster from being eligible, there must be a [quorum-based majority](/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-quorums.md) of `(n/2)+1` nodes, where `n` is the number of master-eligible nodes in the cluster. The minimum number of master nodes to reach quorum in a two-node cluster is the same as for a three-node cluster: two nodes must be available.
28+
Fault tolerance for ECE is based around the concept of *availability zones*.
3929

40-
When you create a cluster with nodes in two availability zones when a third zone is available, ECE can create a tiebreaker in the third availability zone to help establish quorum in case of loss of an availability zone. The extra tiebreaker node that helps to provide quorum does not have to be a full-fledged and expensive node, as it does not hold data. For example: By tagging allocators hosts in ECE, can you create a cluster with eight nodes each in zones `ece-1a` and `ece-1b`, for a total of 16 nodes, and one tiebreaker node in zone `ece-1c`. This cluster can lose any of the three availability zones whilst maintaining quorum, which means that the cluster can continue to process user requests, provided that there is sufficient capacity available when an availability zone goes down.
30+
An availability zone contains resources available to an ECE installation that are isolated from other availability zones to safeguard against potential failure.
4131

42-
By default, each node in an {{es}} cluster is a master-eligible node and a data node. In larger clusters, such as production clusters, it’s a good practice to split the roles, so that master nodes are not handling search or indexing work. When you create a cluster, you can specify to use dedicated [master-eligible nodes](elasticsearch://reference/elasticsearch/configuration-reference/node-settings.md#master-node), one per availability zone.
32+
Planning for a fault-tolerant installation with multiple availability zones means avoiding any single point of failure that could bring down ECE.
4333

44-
::::{warning}
45-
Clusters that only have two or fewer master-eligible node are not [highly available](/deploy-manage/production-guidance/availability-and-resilience.md) and are at risk of data loss. You must have [at least three master-eligible nodes](/deploy-manage/distributed-architecture/discovery-cluster-formation/modules-discovery-quorums.md).
34+
::::{important}
35+
Adding more availability zones should not be used as a way to increase processing capacity and performance. The concept of zones is meant for high availability (2 zones) and fault tolerance (3 zones), but neither will work if your deployments rely on the resources from those zones to be operational. Refer to [scaling considerations](/deploy-manage/production-guidance/scaling-considerations.md#scaling-and-fault-tolerance) for more information.
4636
::::
4737

48-
## Replica shards [ece-ece-ha-3-replica-shards]
38+
The main difference between ECE installations that include two or three availability zones is that three availability zones enable ECE to create {{es}} clusters with a [voting-only tiebreaker](/deploy-manage/distributed-architecture/clusters-nodes-shards/node-roles.md#voting-only-node) instance. If you have only two availability zones in your installation, no tiebreaker can be placed in a third zone, limiting the cluster’s ability to tolerate certain failures.
4939

50-
With multiple {{es}} nodes in multiple availability zones you have the recommended hardware, the next thing to consider is having the recommended index replication. Each index, with the exception of searchable snapshot indexes, should have one or more replicas. Use the index settings API to find any indices with no replica:
40+
## Tiebreaker master nodes
5141

52-
```sh
53-
GET _all/_settings/index.number_of_replicas
54-
```
42+
A tiebreaker is a lightweight voting-only node used in distributed clusters to help avoid split-brain scenarios, where the cluster could incorrectly split into multiple autonomous parts during a network partition.
5543

56-
::::{warning}
57-
Indices with no replica, except for [searchable snapshot indices](/deploy-manage/tools/snapshot-and-restore/searchable-snapshots.md), are not highly available. You should use replicas to mitigate against possible data loss.
58-
::::
44+
When you create a cluster with nodes in two availability zones when a third zone is available, ECE can create a tiebreaker in the third availability zone to help establish quorum in case of loss of an availability zone. The extra tiebreaker node that helps to provide quorum does not have to be a full-fledged and expensive node, as it does not hold data. For example: By [tagging allocators](./ece-configuring-ece-tag-allocators.md) hosts in ECE, can you create a cluster with eight nodes each in zones `ece-1a` and `ece-1b`, for a total of 16 nodes, and one tiebreaker node in zone `ece-1c`. This cluster can lose any of the three availability zones whilst maintaining quorum, which means that the cluster can continue to process user requests, provided that there is sufficient capacity available when an availability zone goes down.
5945

60-
Refer to [](../../reference-architectures.md) for information about {{es}} architectures.
46+
## Zookeeper nodes
6147

62-
## Snapshot backups [ece-ece-ha-4-snapshot]
48+
Make sure you have three Zookeepers—by default, on the Director host—for your ECE installation. Similar to three {{es}} master nodes can form a quorum, three Zookeepers can form the quorum for high availability purposes.
6349

64-
You should configure and use [{{es}} snapshots](/deploy-manage/tools/snapshot-and-restore.md). Snapshots provide a way to backup and restore your {{es}} indices. They can be used to copy indices for testing, to recover from failures or accidental deletions, or to migrate data to other deployments. We recommend configuring an [{{ece}}-level repository](../../tools/snapshot-and-restore/cloud-enterprise.md) to apply across all deployments. See [Work with snapshots](../../tools/snapshot-and-restore.md) for more guidance.
50+
Backing up Zookeeper data directory is also recommended. Refer to [rebuilding a broken Zookeeper quorum](../../../troubleshoot/deployments/cloud-enterprise/rebuilding-broken-zookeeper-quorum.md) for more guidance.
6551

66-
## Furthermore considerations [ece-ece-ha-5-other]
52+
## External resources accessibility
6753

68-
* Make sure you have three Zookeepers - by default, on the Director host - for your ECE installation. Similar to three Elasticsearch master nodes can form a quorum, three Zookeepers can forum the quorum for high availability purposes. Backing up Zookeeper data directory is also recommended, read [this doc](../../../troubleshoot/deployments/cloud-enterprise/rebuilding-broken-zookeeper-quorum.md) for more guidance.
54+
If you’re using a [private Docker registry server](ece-install-offline-with-registry.md) or hosting any [custom bundles and plugins](../../../solutions/search/full-text/search-with-synonyms.md) on a web server, make sure these resources are accessible from all ECE allocators, so they can continue to be accessed in the event of a network partition or zone outage.
6955

70-
* Make sure that if you’re using a [private Docker registry server](ece-install-offline-with-registry.md) or are using any [custom bundles and plugins](../../../solutions/search/full-text/search-with-synonyms.md) hosted on a web server, that these are available to all ECE allocators, so that they can continue to be accessed in the event of a network partition or zone outage.
56+
## Other recommendations
7157

72-
* Don’t delete containers unless guided by Elastic Support or there’s public documentation explicitly describing this as required action. Otherwise, it can cause issues and you may lose access or functionality of your {{ece}} platform. See [Troubleshooting container engines](../../../troubleshoot/deployments/cloud-enterprise/troubleshooting-container-engines.md) for more information.
58+
Avoid deleting containers unless explicitly instructed by Elastic Support or official documentation. Doing so may lead to unexpected issues or loss of access to your {{ece}} platform. For more details, refer to [](/troubleshoot/deployments/cloud-enterprise/troubleshooting-container-engines.md).
7359

74-
If in doubt, please [contact support for help](../../../troubleshoot/deployments/cloud-enterprise/ask-for-help.md).
60+
If in doubt, please [contact support for help](/troubleshoot/index.md#contact-us).

deploy-manage/deploy/cloud-enterprise/ece-networking-prereq.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ When there are multiple hosts for each role, the inbound networking and ports ca
2323

2424
![ECE networking and ports](/deploy-manage/images/cloud-enterprise-ece-networking-ports.png "")
2525

26+
**Inbound traffic from any source**
27+
2628
| **Number** | **Host role** | **Inbound ports** | *Purpose* |
2729
| --- | --- | --- | --- |
2830
| | All | 22 | Installation and troubleshooting SSH access only (TCP)<br> |
@@ -32,11 +34,12 @@ When there are multiple hosts for each role, the inbound networking and ports ca
3234
| 3 | Proxy | 9400, 9443 | Elasticsearch Cross Cluster Search and Cross Cluster Replication with TLS authentication (9400) or API key authentication (9443), also required by load balancers. Can be blocked if [CCR/CCS](../../remote-clusters/ece-enable-ccs.md) is not used.<br> |
3335
| 7 | Coordinator | 12400/12443 | Cloud UI console to API (HTTP/HTTPS)<br> |
3436

35-
In addition to the following list, you should open 12898-12908 and 13898-13908 on the director host for ZooKeeper leader and election activity.
37+
**Inbound traffic from other ECE hosts**
3638

3739
| **Number** | **Host role** | **Inbound ports** | *Purpose* |
3840
| --- | --- | --- | --- |
3941
| 1 | Director | 2112 | ZooKeeper ensemble discovery/joining (TCP)<br> |
42+
| 1 | Director | 12898-12908, 13898-13908 | ZooKeeper leader and election activity |
4043
| 4 | Director | 12191-12201 | Client forwarder to ZooKeeper, one port per director (TLS tunnels)<br> |
4144
| 5 | Allocator | 19000-19999 | Elasticsearch node to node and Proxy to Elasticsearch for CCR/CCS (Node Transport 6.x+/TLS 6.x+)<br> |
4245
| 7 | Coordinator | 22191-22195 | Connections to initial coordinator from allocators and proxies, one port per coordinator, up to five (TCP)<br> |

deploy-manage/deploy/cloud-on-k8s/elastic-maps-server.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,11 @@ The following sections describe how to customize an Elastic Maps Server deployme
3535
* [Disable TLS](http-configuration.md#k8s-maps-http-disable-tls)
3636
* [Ingress and Kibana configuration](http-configuration.md#k8s-maps-ingress)
3737

38+
:::{admonition} Support scope for Ingress Controllers
39+
[Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) is a standard Kubernetes concept. While ECK-managed workloads can be publicly exposed using ingress resources, and we provide [example configurations](/deploy-manage/deploy/cloud-on-k8s/recipes.md), setting up an Ingress controller requires in-house Kubernetes expertise.
40+
41+
If ingress configuration is challenging or unsupported in your environment, consider using standard `LoadBalancer` services as a simpler alternative.
42+
:::
3843

3944

4045

deploy-manage/deploy/cloud-on-k8s/http-configuration.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,5 +33,10 @@ You can disable the generation of the self-signed certificate and hence disable
3333

3434
To use Elastic Maps Server from your Kibana instances, you need to configure Kibana to fetch maps from your Elastic Maps Server instance by using the [`map.emsUrl`](/explore-analyze/visualize/maps/maps-connect-to-ems.md#elastic-maps-server-kibana) configuration key. The value of this setting needs to be the URL where the Elastic Maps Server instance is reachable from your browser. The certificates presented by Elastic Maps Server need to be trusted by the browser, and the URL must have the same origin as the URL where your Kibana is hosted to avoid cross origin resource issues. Check the [recipe section](https://github.com/elastic/cloud-on-k8s/tree/2.16/config/recipes/) for an example on how to set this up using an Ingress resource.
3535

36+
:::{admonition} Support scope for Ingress Controllers
37+
[Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) is a standard Kubernetes concept. While ECK-managed workloads can be publicly exposed using ingress resources, and we provide [example configurations](/deploy-manage/deploy/cloud-on-k8s/recipes.md), setting up an Ingress controller requires in-house Kubernetes expertise.
38+
39+
If ingress configuration is challenging or unsupported in your environment, consider using standard `LoadBalancer` services as a simpler alternative.
40+
:::
3641

3742

deploy-manage/deploy/cloud-on-k8s/managing-deployments-using-helm-chart.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,13 @@ helm install es-quickstart elastic/eck-elasticsearch -n elastic-stack --create-n
101101

102102
## Adding Ingress to the Elastic stack [k8s-eck-stack-ingress]
103103

104+
:::{admonition} Support scope for Ingress Controllers
105+
[Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) is a standard Kubernetes concept. While ECK-managed workloads can be publicly exposed using ingress resources, and we provide [example configurations](/deploy-manage/deploy/cloud-on-k8s/recipes.md), setting up an Ingress controller requires in-house Kubernetes expertise.
106+
107+
If ingress configuration is challenging or unsupported in your environment, consider using standard `LoadBalancer` services as a simpler alternative.
108+
:::
109+
110+
104111
Both {{es}} and {{kib}} support [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/), which can be enabled using the following options:
105112

106113
**If an individual chart is used (not eck-stack)**

0 commit comments

Comments
 (0)