production guidance work started. WIP

eedugon · eedugon · commit 6b848d4aaee4 · 2025-03-14T22:56:07.000+01:00
diff --git a/deploy-manage/production-guidance.md b/deploy-manage/production-guidance.md
@@ -3,9 +3,110 @@ mapped_pages:
   - https://www.elastic.co/guide/en/cloud/current/ec-best-practices-data.html
   - https://www.elastic.co/guide/en/elasticsearch/reference/current/scalability.html
 ---
+$$$ec-best-practices-data$$$
+# Production guidance [scalability]
 
-# Production guidance [ec-best-practices-data]
+% start bringing https://www.elastic.co/guide/en/elasticsearch/reference/current/scalability.html here
+% try to merge https://www.elastic.co/guide/en/cloud/current/ec-planning.html and https://www.elastic.co/guide/en/cloud/current/ec-best-practices-data.html
+% mention all deployment types! what the user needs to be aware for orchestrated deployments.
 
+Many teams rely on {{es}} to run their key services. To keep these services running, you can design your {{es}} deployment to keep {{es}} available, even in case of large-scale outages. To keep it running fast, you also can design your deployment to be responsive to production workloads.
+
+{{es}} is built to be always available and to scale with your needs. It does this using a [distributed architecture](./distributed-architecture.md). By distributing your cluster, you can keep Elastic online and responsive to requests.
+
+In case of failure, {{es}} offers tools for [cross-cluster replication](./tools/cross-cluster-replication.md) and [cluster snapshots](./tools/snapshot-and-restore.md) that can help you fall back or recover quickly. You can also use cross-cluster replication to serve requests based on the geographic location of your users and your resources.
+
+% not very relevant
+{{es}} also offers security and monitoring tools to help you keep your cluster highly available.
+
+
+Recommendations out there:
+- Use multiple nodes and shards
+
+## Section overview
+
+Running the {{stack}} in production requires careful planning to ensure resilience, performance, and scalability. This section outlines best practices and recommendations for optimizing {{es}} and {{kib}} in production environments.
+
+* High availability (HA) and resilience
+  * Resilience in small clusters
+  * Resilienve in larger clusters
+
+* Performance optimizations:
+  * Elasticsearch
+    * General recomendations
+    * Tune for indexing speed
+    * Tune for search speed
+    * Tune for approximate kNN search
+    * Tune for disk usage
+    * Size your shards
+    * Use {{es}} for time series data
+  * Kibana
+    * Kibana task manager scaling considerations
+    * Kibana alerting
+
+* Scaling
+
+For additional production-critical topics, refer to:
+
+* [](./security.md)
+
+* [](./tools.md)
+
+* [](./monitor.md)
+
+
+
+(Regardless if you are running a hosted or a self managed deployment, the content of this section allow you to understand and take key decisions when designing your clusters in the following areas:)
+
+Cluster design tiene:
+- design for resilience
+- tune for xxx
+- tune for xxx
+
+
+
+## Deployment types
+
+These concepts aren’t essential if you’re just getting started. How you [deploy {{es}}](/get-started/deployment-options.md) in production determines what you need to know:
+
+* **Self-managed {{es}}**: You are responsible for setting up and managing nodes, clusters, shards, and replicas. This includes managing the underlying infrastructure, scaling, and ensuring high availability through failover and backup strategies.
+* **Elastic Cloud**: Elastic can autoscale resources in response to workload changes. Choose from different deployment types to apply sensible defaults for your use case. A basic understanding of nodes, shards, and replicas is still important.
+* **Elastic Cloud Serverless**: You don’t need to worry about nodes, shards, or replicas. These resources are 100% automated on the serverless platform, which is designed to scale with your workload.
+
+(add ECE and ECK)
+
+
+% discarded text (from ECH best practices)
+
+## HA and Resilience
+
+{{es}} and {{kib}} provide mechanisms for HA and resilience
+
+
+### Use multiple nodes and shards
+
+### CCR for disaster recovery and geo-proximity
+
+## Performance tuning [cluster-design]
+
+{{es}} offers many options that allow you to configure your cluster to meet your organization’s goals, requirements, and restrictions. You can review the following guides to learn how to tune your cluster to meet your needs:
+
+::::{note}
+In orchestrated deployments some of the settings mentioned in this section are not applicable. Refer to each of the section headers to understand whether is applicable to your deployment type.
+::::
+
+* [Designing for resilience](availability-and-resilience.md)
+* [Tune for indexing speed](optimize-performance/indexing-speed.md)
+* [Tune for search speed](optimize-performance/search-speed.md)
+* [Tune for disk usage](optimize-performance/disk-usage.md)
+* [Tune for time series data](../../manage-data/use-case-use-elasticsearch-to-manage-time-series-data.md)
+
+Many {{es}} options come with different performance considerations and trade-offs. The best way to determine the optimal configuration for your use case is through [testing with your own data and queries](https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing).
+
+
+## Scaling
+
+% from https://www.elastic.co/guide/en/cloud/current/ec-planning.html ?
 This section provides some best practices for managing your data to help you set up a production environment that matches your workloads, policies, and deployment needs.
 
 
diff --git a/deploy-manage/production-guidance/getting-ready-for-production-elasticsearch.md b/deploy-manage/production-guidance/getting-ready-for-production-elasticsearch.md
@@ -56,6 +56,7 @@ As with any enterprise system, you need tools to secure, manage, and monitor you
 
 
 ## Cluster design [cluster-design] 
+% moved to landing page.
 
 {{es}} offers many options that allow you to configure your cluster to meet your organization’s goals, requirements, and restrictions. You can review the following guides to learn how to tune your cluster to meet your needs:
 
diff --git a/deploy-manage/production-guidance/kibana-in-production-environments.md b/deploy-manage/production-guidance/kibana-in-production-environments.md
@@ -1,11 +1,9 @@
 ---
-navigation_title: "Production considerations"
+navigation_title: "Run Kibana in production"
 mapped_pages:
   - https://www.elastic.co/guide/en/kibana/current/production.html
 ---
 
-
-
 # Kibana in production environments [production]
 
 
@@ -22,7 +20,11 @@ While {{kib}} isn’t terribly resource intensive, we still recommend running {{
 
 ## Load balancing across multiple {{kib}} instances [load-balancing-kibana] 
 
-To serve multiple {{kib}} installations behind a load balancer, you must change the configuration. See [Configuring {{kib}}](../deploy/self-managed/configure.md) for details on each setting.
+To serve multiple {{kib}} instances from the same deployment behind a load balancer, you must change the configuration. See [Configuring {{kib}}](../deploy/self-managed/configure.md) for details on each setting.
+
+::::{note}
+In orchestrated deployments such as {{ech}}, {{ece}}, and {{eck}}, the necessary configuration for multiple {{kib}} instances within the same deployment is automatically managed by the orchestrator. This process is transparent to the user, requiring no manual configuration.
+::::
 
 These settings must be unique across each {{kib}} instance:
 
@@ -70,8 +72,13 @@ bin/kibana -c config/instance2.yml
 
 
 ## Accessing multiple load-balanced {{kib}} clusters [accessing-load-balanced-kibana] 
-
+% TBD, WIP
 To access multiple load-balanced {{kib}} clusters from the same browser, explicitly set `xpack.security.cookieName` to the same value in the {{kib}} configuration of each {{kib}} instance.
+To access different load-balanced {{kib}} deployments from the same browser, explicitly set `xpack.security.cookieName` to the same value in the {{kib}} configuration of each {{kib}} instance.
+
+To access different load-balanced {{kib}} instances within the same deployment from the same browser, explicitly set `xpack.security.cookieName` to the same value in the configuration of each instance.
+
+Configure different value of `xpack.security.cookieName` in {{kib}} instances belonging to other deployments.
 
 Each {{kib}} cluster must have a different value of `xpack.security.cookieName`.
 
diff --git a/deploy-manage/production-guidance/optimize-performance.md b/deploy-manage/production-guidance/optimize-performance.md
@@ -17,5 +17,7 @@ This section provides recommendations for various use cases.
 * [*Tune approximate kNN search*](optimize-performance/approximate-knn-search.md)
 * [*Tune for disk usage*](optimize-performance/disk-usage.md)
 * [*Size your shards*](optimize-performance/size-shards.md)
-* [*Use {{es}} for time series data*](../../manage-data/use-case-use-elasticsearch-to-manage-time-series-data.md)
+
+% this one has been moved to manage data, not sure if it makes sense to mention here, as it's not about performance or prod recommendations
+% * [*Use {{es}} for time series data*](../../manage-data/use-case-use-elasticsearch-to-manage-time-series-data.md)
 
diff --git a/deploy-manage/production-guidance/plan-for-production-elastic-cloud.md b/deploy-manage/production-guidance/plan-for-production-elastic-cloud.md
@@ -4,7 +4,7 @@ mapped_urls:
   - https://www.elastic.co/guide/en/cloud-heroku/current/ech-planning.html
 ---
 
-# Plan for production (Elastic Cloud)
+# Plan for production (Elastic Cloud) [ec-planning]
 
 % What needs to be done: Refine
 
@@ -28,4 +28,64 @@ $$$ech-workloads$$$
 **This page is a work in progress.** The documentation team is working to combine content pulled from the following pages:
 
 * [/raw-migrated-files/cloud/cloud/ec-planning.md](/raw-migrated-files/cloud/cloud/ec-planning.md)
-* [/raw-migrated-files/cloud/cloud-heroku/ech-planning.md](/raw-migrated-files/cloud/cloud-heroku/ech-planning.md)
+* [/raw-migrated-files/cloud/cloud-heroku/ech-planning.md](/raw-migrated-files/cloud/cloud-heroku/ech-planning.md)
+
+
+{{ech}} supports a wide range of configurations. With such flexibility comes great freedom, but also the first rule of deployment planning: Your deployment needs to be matched to the workloads that you plan to run on your {{es}} clusters and {{kib}} instances. Specifically, this means two things:
+
+* [Does your data need to be highly available?](../../../deploy-manage/production-guidance/plan-for-production-elastic-cloud.md#ec-ha)
+* [Do you know when to scale?](../../../deploy-manage/production-guidance/plan-for-production-elastic-cloud.md#ec-workloads)
+
+
+## Does your data need to be highly available? [ec-ha] 
+
+With {{ech}}, your deployment can be spread across as many as three separate availability zones, each hosted in its own, separate data center. Why this matters:
+
+* Data centers can have issues with availability. Internet outages, earthquakes, floods, or other events could affect the availability of a single data center. With a single availability zone, you have a single point of failure that can bring down your deployment.
+* Multiple availability zones help your deployment remain available. This includes your {{es}} cluster, provided that your cluster is sized so that it can sustain your workload on the remaining data centers and that your indices are configured to have at least one replica.
+* Multiple availability zones enable you to perform changes to resize your deployment with zero downtime.
+
+We recommend that you use at least two availability zones for production and three for mission-critical systems. Just one zone might be sufficient, if your {{es}} cluster is mainly used for testing or development and downtime is acceptable, but should never be used for production.
+
+With multiple {{es}} nodes in multiple availability zones you have the recommended hardware, the next thing to consider is having the recommended index replication. Each index, with the exception of searchable snapshot indexes, should have one or more replicas. Use the index settings API to find any indices with no replica:
+
+```sh
+GET _all/_settings/index.number_of_replicas
+```
+
+Moreover, a high availability (HA) cluster requires at least three master-eligible nodes. For clusters that have fewer than six {{es}} nodes, any data node in the hot tier will also be a master-eligible node. You can achieve this by having hot nodes (serving as both data and master-eligible nodes) in three availability zones, or by having data nodes in two zones and a tiebreaker (will be automatically added if you choose two zones). For clusters that have six {{es}} nodes and beyond, dedicated master-eligible nodes are introduced. When your cluster grows, consider separating dedicated master-eligible nodes from dedicated data nodes. We recommend using at least 4GB RAM for dedicated master nodes.
+
+The data in your {{es}} clusters is also backed up every 30 minutes, 4 hours, or 24 hours, depending on which snapshot interval you choose. These regular intervals provide an extra level of redundancy. We do support [snapshot and restore](../../../deploy-manage/tools/snapshot-and-restore.md), regardless of whether you use one, two, or three availability zones. However, with only a single availability zone and in the event of an outage, it might take a while for your cluster come back online. Using a single availability zone also leaves your cluster exposed to the risk of data loss, if the backups you need are not useable (failed or partial snapshots missing the indices to restore) or no longer available by the time that you realize that you might need the data (snapshots have a retention policy).
+
+::::{warning} 
+Clusters that use only one availability zone are not highly available and are at risk of data loss. To safeguard against data loss, you must use at least two availability zones.
+::::
+
+
+::::{warning} 
+Indices with no replica, except for searchable snapshot indices, are not highly available. You should use replicas to mitigate against possible data loss.
+::::
+
+
+::::{warning} 
+Clusters that only have one master node are not highly available and are at risk of data loss. You must have three master-eligible nodes.
+::::
+
+
+
+## Do you know when to scale? [ec-workloads] 
+
+Knowing how to scale your deployment is critical, especially when unexpected workloads hits. Don’t forget to [check your performance metrics](../../../deploy-manage/monitor/monitoring-data/ec-saas-metrics-accessing.md) to make sure your deployments are healthy and can cope with your workloads.
+
+Scaling with {{ech}} is easy:
+
+* Turn on [deployment autoscaling](../../../deploy-manage/autoscaling.md) to let {{ecloud}} manage your deployments by adjusting their available resources automatically.
+* Or, if you prefer manual control, log in to the [{{ecloud}} Console](https://cloud.elastic.co?page=docs&placement=docs-body), select your deployment, select **Edit deployment** from the **Actions** dropdown, and either increase the number of zones or the size per zone.
+
+::::{warning} 
+Increasing the number of zones should not be used to add more resources. The concept of zones is meant for High Availability (2 zones) and Fault Tolerance (3 zones), but neither will work if the cluster relies on the resources from those zones to be operational. The recommendation is to scale up the resources within a single zone until the cluster can take the full load (add some buffer to be prepared for a peak of requests), then scale out by adding additional zones depending on your requirements: 2 zones for High Availability, 3 zones for Fault Tolerance.
+::::
+
+
+Refer to [Sizing {{es}}: Scaling up and out](https://www.elastic.co/blog/found-sizing-elasticsearch) to identify which questions to ask yourself when determining which cluster size is the best fit for your {{es}} use case.
+
diff --git a/deploy-manage/toc.yml b/deploy-manage/toc.yml
@@ -367,22 +367,23 @@ toc:
   - file: production-guidance.md
     children:
       - file: production-guidance/getting-ready-for-production-elasticsearch.md
-      - file: production-guidance/kibana-in-production-environments.md
       - file: production-guidance/plan-for-production-elastic-cloud.md
+      - file: production-guidance/kibana-in-production-environments.md
+        children:
+          - file: production-guidance/kibana-task-manager-scaling-considerations.md
+          - file: production-guidance/kibana-alerting-production-considerations.md
       - file: production-guidance/availability-and-resilience.md
         children:
           - file: production-guidance/availability-and-resilience/resilience-in-small-clusters.md
           - file: production-guidance/availability-and-resilience/resilience-in-larger-clusters.md
       - file: production-guidance/optimize-performance.md
         children:
+          - file: production-guidance/general-recommendations.md
           - file: production-guidance/optimize-performance/indexing-speed.md
           - file: production-guidance/optimize-performance/search-speed.md
           - file: production-guidance/optimize-performance/approximate-knn-search.md
           - file: production-guidance/optimize-performance/disk-usage.md
           - file: production-guidance/optimize-performance/size-shards.md
-      - file: production-guidance/kibana-task-manager-scaling-considerations.md
-      - file: production-guidance/kibana-alerting-production-considerations.md
-      - file: production-guidance/general-recommendations.md
   - file: reference-architectures.md
     children:
       - file: reference-architectures/hotfrozen-high-availability.md
diff --git a/raw-migrated-files/cloud/cloud-heroku/ech-planning.md b/raw-migrated-files/cloud/cloud-heroku/ech-planning.md
diff --git a/raw-migrated-files/cloud/cloud/ec-planning.md b/raw-migrated-files/cloud/cloud/ec-planning.md
diff --git a/raw-migrated-files/toc.yml b/raw-migrated-files/toc.yml

Original file line number	Diff line number	Diff line change
`@@ -56,6 +56,7 @@ As with any enterprise system, you need tools to secure, manage, and monitor you`
`56`	`56`
`57`	`57`
`58`	`58`	`## Cluster design [cluster-design]`
	`59`	`+% moved to landing page.`
`59`	`60`
`60`	`61`	`{{es}} offers many options that allow you to configure your cluster to meet your organization’s goals, requirements, and restrictions. You can review the following guides to learn how to tune your cluster to meet your needs:`
`61`	`62`