diff --git a/deploy-manage/production-guidance/kibana-in-production-environments.md b/deploy-manage/production-guidance/kibana-in-production-environments.md index 2090c0a01e..901eb0a27a 100644 --- a/deploy-manage/production-guidance/kibana-in-production-environments.md +++ b/deploy-manage/production-guidance/kibana-in-production-environments.md @@ -14,13 +14,13 @@ How you deploy {{kib}} largely depends on your use case. If you are the only use ## Scalability -Historically, Kibana’s scalability was primarily influenced by the number of concurrent users and the complexity of dashboards and visualizations. However, with the introduction of new capabilities such as [{{kib}} Alerting](/explore-analyze/alerts-cases.md) and the [Detection Rules](/solutions/security/detect-and-alert.md) engine, critical components for [Observability](/solutions/observability.md) and [Security](/solutions/security.md) solutions, the scalability factors have evolved significantly. +With the introduction of new capabilities such as [{{kib}} Alerting](/explore-analyze/alerts-cases.md) and the [Detection Rules](/solutions/security/detect-and-alert.md) engine, critical components for [Observability](/solutions/observability.md) and [Security](/solutions/security.md) solutions, the scalability factors have evolved significantly. Now, Kibana’s resource requirements extend beyond user activity. The system must also handle workloads generated by automated processes, such as scheduled alerts, background detection rules, and other periodic tasks. These operations are managed by [{{kib}} Task Manager](./kibana-task-manager-scaling-considerations.md), which is responsible for scheduling, executing, and coordinating all background tasks. Additionally, the task manager enables distributed coordination across multiple {{kib}} instances, allowing {{kib}} to function as a logical cluster in certain aspects. -::::{important} +::::{important} * {{kib}} does not support rolling [upgrades](/deploy-manage/upgrade/deployment-or-cluster/kibana.md), and deploying mixed versions of {{kib}} can result in data loss or upgrade failures. Shut down all instances of {{kib}} before performing an upgrade, and ensure all running {{kib}} instances have matching versions. * While {{kib}} isn’t resource intensive, we still recommend running {{kib}} separate from your {{es}} data or master nodes. :::: @@ -33,6 +33,8 @@ Topics covered include: * [High availability and traffic distribution](./kibana-load-balance-traffic.md): For self-managed deployments, learn how to load balance traffic across multiple {{kib}} instances, how to balance traffic to different deployments, and how to distribute {{kib}} traffic across multiple {{es}} instances. +* [Scaling {{kib}} based on traffic](./kibana-traffic-scaling-considerations.md): Learn the quantity of CPU and memory resources needed by {{kib}} to handle expected traffic. + * [Configure {{kib}} memory usage](./kibana-configure-memory.md): Configure {{kib}} memory limit in self-managed deployments. * [Manage {{kib}} background tasks](./kibana-task-manager-scaling-considerations.md): Learn how {{kib}} runs background tasks like alerting and reporting, and get guidance on scaling and throughput tuning for reliable task execution. Applicable to all deployment types. diff --git a/deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md b/deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md index dca6d681a8..64f04f6403 100644 --- a/deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md +++ b/deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md @@ -65,7 +65,7 @@ For detailed troubleshooting guidance, see [Troubleshooting](../../troubleshoot/ ## Scaling guidance [task-manager-scaling-guidance] -How you deploy {{kib}} largely depends on your use case. Predicting the throughout a deployment might require to support Task Management is difficult because features can schedule an unpredictable number of tasks at a variety of scheduled cadences. +How you deploy {{kib}} largely depends on your use case. Predicting the throughput a deployment requires to support Task Management is difficult because features can schedule an unpredictable number of tasks at a variety of scheduled cadences. However, there is a relatively straight forward method you can follow to produce a rough estimate based on your expected usage. diff --git a/deploy-manage/production-guidance/kibana-traffic-scaling-considerations.md b/deploy-manage/production-guidance/kibana-traffic-scaling-considerations.md new file mode 100644 index 0000000000..7140515849 --- /dev/null +++ b/deploy-manage/production-guidance/kibana-traffic-scaling-considerations.md @@ -0,0 +1,117 @@ +--- +navigation_title: Traffic scaling considerations +mapped_pages: + - https://www.elastic.co/guide/en/kibana/current/kibana-traffic-scaling-considerations.html +applies_to: + deployment: + ess: all + ece: all + eck: all + self: all +products: + - id: kibana +--- + +# Scale {{kib}} for your traffic workload + +::::{important} +This guidance does not apply to scaling {{kib}} for task manager. If you intend to optimize {{kib}} for alerting capabilities, see [](./kibana-task-manager-scaling-considerations.md). +:::: + +{{kib}}'s HTTP traffic is diverse and can be unpredictable. Traffic includes serving static assets like files, processing large search responses from {{es}}, and managing CRUD operations against complex domain objects like SLOs. The scale of the load created by each of these kinds of traffic will vary depending on your usage patterns. While difficult to predict, there are two important aspects to consider when provisioning CPU and memory resources for your {{kib}} instances: + +* **Concurrency**: How many users you expect to be interacting with {{kib}} simultaneously. Concurrency performance is largely **CPU-bound**. Approaching this limit increases response times. +* **Request and response size**: The size of requests and responses you expect {{kib}} to service. Performance when managing large requests and responses is largely **memory-bound**. Approaching this limit increases response times and may cause {{kib}} to crash. + +::::{tip} +On [{{serverless-full}}](../deploy/elastic-cloud/serverless.md) scaling {{kib}} is fully managed for you. +:::: + +CPU and memory boundedness often interact in important ways. If CPU-bound activity is reaching its limit, memory pressure will likely increase as {{kib}} has less time for activities like garbage collection. If memory-bound activity is reaching its limit, there may be more CPU work to free claimed memory, increasing CPU pressure. [Tracking CPU and memory metrics over time](#advanced-scaling-using-stack-monitoring-metrics) can be very useful for understanding where your {{kib}} is experiencing a bottleneck. + +::::{note} +Traffic to {{kib}} often comes in short bursts or spikes that can overwhelm an underprovisioned {{kib}} instance. In production environments, an overwhelmed {{kib}} instance will typically return 502 or 503 error responses. + +Load balancing helps to mitigate traffic spikes by horizontally scaling your {{kib}} deployments and improving {{kib}}'s availability. To learn more about load balancing, refer to [](./kibana-load-balance-traffic.md). +:::: + +## Before you start [_before_sizing_kibana] + +{{es}} is the search engine and backing database of {{kib}}. Any performance issues in {{es}} will manifest in {{kib}}. Additionally, while Elastic tries to mitigate this possibility, {{kib}} may be sending requests to {{es}} that degrade performance if {{es}} is underprovisioned. + +### Is the {{es}} cluster correctly sized? + +Follow [the production guidance for {{es}}](./elasticsearch-in-production-environments.md). + +### What requests is {{kib}} sending to {{es}}? + +In user interfaces like Dashboards or Discover, you can view the full query that {{kib}} is sending to {{es}}. This is a good way to get an idea of the volume of data and work a {{kib}} visualization or dashboard is creating for {{es}}. Dashboards with many visualizations will generate higher load for {{es}} and {{kib}}. + +## Basic scaling using number of concurrent users + +Follow this strategy if you know the maximum number of expected concurrent users. + +Start {{kib}} on **1 vCPU** and **2GB** of memory. This should comfortably serve a set of 10 concurrent users performing analytics activities like browsing dashboards. + +If you are experiencing performance issues, you can scale {{kib}} vertically by adding the following resources for every 10 additional concurrent users: +* 1 vCPU +* 2GB of memory + +These amounts are a safe minimum to ensure that {{kib}} is not resource-starved for common analytics use cases. + +It is recommended to scale vertically to a maximum of **8.4 vCPU** and **8GB** of memory. + +You should also combine vertical scaling with horizontal scaling to handle greater concurrency or bursty traffic. Refer to [](./kibana-load-balance-traffic.md) for guidance. + +### Scaling examples + +| Concurrent users | Minimum vCPU | Minimum memory | ECH and ECE deployment examples | +| --- | --- | --- | --- | +| 50 | 5 vCPU | 10GB | • {{kib}} size per zone of 16GB RAM and 8 vCPU in 1 availability zone (creates 2 x 8GB nodes)

• {{kib}} size per zone of 8GB RAM and up to 8 vCPU across 2 availability zones

• {{kib}} size per zone of 4GB RAM and up to 8 vCPU across 3 availability zones | +| 100 | 10 vCPU | 20GB | • {{kib}} size per zone of 24 GB RAM and 12 vCPU in 1 availability zone (creates 3 x 8GB nodes)

• {{kib}} size per zone of 8GB RAM and up to 8 vCPU across 3 availability zones

| + +Refer to the [guidance on adjusting {{kib}}'s allocated resources](#adjust-resource-allocations) once you have determined sizing. + +## Advanced scaling using stack monitoring metrics + +Building on the simple strategy outlined above, we can identify where {{kib}} is resource constrained more precisely. **Self-managed** and **{{eck}}** users manage CPU and memory allocations independently and can further tailor resources based on performance metrics. + +### Gather usage information [_monitoring-kibana-metrics] + +In order to understand the impact of your usage patterns on a single {{kib}} instance, use the [stack monitoring](../monitor/stack-monitoring.md) feature. + +Using stack monitoring, you can gather the following metrics for your {{kib}} instance: + +* **Event loop delay (ELD) in milliseconds:** A Node.js concept that roughly translates to the number of milliseconds by which processing of events is delayed due to CPU-intensive activities. +* **Heap size in bytes:** The amount of bytes currently held in memory dedicated to {{kib}}'s heap space. +* **HTTP connections:** The number of sockets that the {{kib}} server has open. + +### Scale CPU using ELD metrics [kibana-traffic-load-cpu-sizing] + +Event loop delay (ELD) is an important metric for understanding whether {{kib}} is engaged in CPU-bound activity. + +**As a general target, ELD should be below ~220ms 95% of the time**. Higher delays may mean {{kib}} is CPU-starved. Sporadic increases above 200ms may mean that {{kib}} is periodically processing CPU-intensive activities like large responses from {{es}}, whereas consistently high ELD may mean {{kib}} is struggling to service tasks and requests. + +Before increasing CPU resources, consider the impact of ELD on user experience. If users are able to use {{kib}} without the frustration that comes from a blocked CPU, provisioning additional CPU resources will not be impactful, although having spare resources in case of unexpected spikes is useful. + +Monitoring {{kib}}'s ELD over time is a solid strategy for knowing when additional CPU resource is needed based on your usage patterns. + +Refer to the [guidance on adjusting {{kib}}'s allocated resources](#adjust-resource-allocations) once you have determined vCPU sizing. + +### Scale memory using heap size metrics [kibana-traffic-load-memory-sizing] + +Heap size is an important metric to track. If {{kib}}'s heap size grows beyond the heap limit, {{kib}} will crash. By monitoring heap size, you can help ensure that {{kib}} has enough memory available. + +Self-managed users must provision memory to the host that {{kib}} is running on as well as configure allocated heap. See [the guidance on configuring {{kib}} memory](./kibana-configure-memory.md). + +Refer to the [guidance on adjusting {{kib}}'s allocated resources](#adjust-resource-allocations) once you have determined memory sizing. + +## Adjust resource allocations for {{kib}} [adjust-resource-allocations] +The way that you alter the resources allocated to your {{kib}} instance depends on your deployment type: +* **[{{ech}}](/deploy-manage/deploy/elastic-cloud/ec-customize-deployment-components.md) and [{{ece}}](/deploy-manage/deploy/elastic-cloud/configure.md):** Users can adjust {{kib}}'s memory by viewing their deployment and editing the {{kib}} instance's resource configuration. In these environments, size increments are predetermined. +* **{{eck}}:** Users can configure pod memory and CPU resources. Refer to [](../deploy/cloud-on-k8s/manage-compute-resources.md). +* **Self-managed:** Users must provision memory to the host that {{kib}} is running on as well as configure allocated heap. See [the guidance on configuring {{kib}} memory](./kibana-configure-memory.md). + +:::{note} +For {{eck}} and self-managed deployments, Node.js suggests allocating 80% of available host memory to heap, assuming that {{kib}} is the only server process running on the (virtual) host. This allows for memory resources to be used for other activities, for example, allowing for HTTP sockets to be allocated. +::: \ No newline at end of file diff --git a/deploy-manage/toc.yml b/deploy-manage/toc.yml index 9858cfdf31..5e5b188629 100644 --- a/deploy-manage/toc.yml +++ b/deploy-manage/toc.yml @@ -16,8 +16,8 @@ toc: - hidden: deploy/elastic-cloud/azure-marketplace-pricing.md - hidden: deploy/elastic-cloud/create-monthly-pay-as-you-go-subscription-on-gcp-marketplace.md - file: deploy/elastic-cloud/aws-marketplace.md - - file: deploy/elastic-cloud/azure-native-isv-service.md - - file: deploy/elastic-cloud/google-cloud-platform-marketplace.md + - file: deploy/elastic-cloud/azure-native-isv-service.md + - file: deploy/elastic-cloud/google-cloud-platform-marketplace.md - file: deploy/elastic-cloud/heroku.md children: - file: deploy/elastic-cloud/heroku-getting-started-installing.md @@ -47,7 +47,7 @@ toc: children: - file: deploy/elastic-cloud/ec-change-hardware-profile.md children: - - file: deploy/elastic-cloud/change-hardware.md + - file: deploy/elastic-cloud/change-hardware.md - file: deploy/elastic-cloud/ec-customize-deployment-components.md - file: deploy/elastic-cloud/edit-stack-settings.md - file: deploy/elastic-cloud/add-plugins-extensions.md @@ -358,6 +358,7 @@ toc: - file: production-guidance/kibana-load-balance-traffic.md - file: production-guidance/kibana-configure-memory.md - file: production-guidance/kibana-task-manager-scaling-considerations.md + - file: production-guidance/kibana-traffic-scaling-considerations.md - file: production-guidance/kibana-alerting-production-considerations.md - file: production-guidance/kibana-reporting-production-considerations.md - file: reference-architectures.md @@ -452,7 +453,7 @@ toc: - file: autoscaling/autoscaling-deciders.md - file: autoscaling/trained-model-autoscaling.md - file: security.md - children: + children: - file: security/secure-hosting-environment.md children: - file: security/secure-your-elastic-cloud-enterprise-installation.md @@ -578,7 +579,7 @@ toc: - file: users-roles/cluster-or-deployment-auth/pki.md - file: users-roles/cluster-or-deployment-auth/custom.md - file: users-roles/cluster-or-deployment-auth/built-in-users.md - children: + children: - file: users-roles/cluster-or-deployment-auth/built-in-sm.md - file: users-roles/cluster-or-deployment-auth/orchestrator-managed-users-overview.md children: @@ -610,7 +611,7 @@ toc: - file: users-roles/cluster-or-deployment-auth/kibana-role-management.md - file: users-roles/cluster-or-deployment-auth/role-restriction.md - title: "Elasticsearch privileges" - crosslink: elasticsearch://reference/elasticsearch/security-privileges.md + crosslink: elasticsearch://reference/elasticsearch/security-privileges.md - file: users-roles/cluster-or-deployment-auth/kibana-privileges.md - file: users-roles/cluster-or-deployment-auth/mapping-users-groups-to-roles.md children: @@ -704,7 +705,7 @@ toc: children: - file: monitor/stack-monitoring/kibana-monitoring-elastic-agent.md - file: monitor/stack-monitoring/kibana-monitoring-metricbeat.md - - file: monitor/stack-monitoring/kibana-monitoring-legacy.md + - file: monitor/stack-monitoring/kibana-monitoring-legacy.md - file: monitor/stack-monitoring/kibana-monitoring-data.md - file: monitor/monitoring-data/visualizing-monitoring-data.md children: @@ -803,7 +804,7 @@ toc: - file: upgrade/plan-upgrade.md - file: upgrade/prepare-to-upgrade.md children: - - file: upgrade/prepare-to-upgrade/upgrade-assistant.md + - file: upgrade/prepare-to-upgrade/upgrade-assistant.md - file: upgrade/deployment-or-cluster.md children: - file: upgrade/deployment-or-cluster/upgrade-on-ech.md @@ -819,8 +820,8 @@ toc: children: - file: upgrade/deployment-or-cluster/saved-object-migrations.md - file: upgrade/deployment-or-cluster/kibana-roll-back.md - - file: upgrade/deployment-or-cluster/enterprise-search.md - - file: upgrade/ingest-components.md + - file: upgrade/deployment-or-cluster/enterprise-search.md + - file: upgrade/ingest-components.md - file: upgrade/orchestrator.md children: - file: upgrade/orchestrator/upgrade-cloud-enterprise.md