Skip to content
Closed
Show file tree
Hide file tree
Changes from 56 commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
e3ef7a6
updates
georgewallace Oct 25, 2024
87d6282
updates
georgewallace Oct 25, 2024
cbb3024
updating images
georgewallace Oct 25, 2024
4231658
setting up toc
georgewallace Oct 25, 2024
765cb44
correcting toc
georgewallace Oct 25, 2024
b5c46bf
doc fixes
georgewallace Oct 25, 2024
6763907
updates to docs for shard management
georgewallace Oct 25, 2024
6a20ece
removed whitespace from diagram
georgewallace Oct 25, 2024
7d7f584
updating note
georgewallace Oct 25, 2024
f4fe202
moved common content out of architectures
georgewallace Oct 25, 2024
d77a166
updates
georgewallace Oct 25, 2024
96bf4bf
moving more content
georgewallace Oct 25, 2024
7f26b92
fixing issues
georgewallace Oct 29, 2024
eecf439
updates
georgewallace Oct 29, 2024
e6bb795
updates
georgewallace Oct 29, 2024
1d642a2
updates
georgewallace Oct 29, 2024
c3ab084
updates
georgewallace Oct 29, 2024
de9669b
fixing hierarchy
georgewallace Oct 29, 2024
c6c9763
updates
georgewallace Oct 29, 2024
d72e8c6
updateS
georgewallace Oct 29, 2024
993ab70
updates
georgewallace Oct 29, 2024
8caa624
updates with new titles
georgewallace Oct 29, 2024
9422a2c
more updates
georgewallace Oct 29, 2024
bed222d
fixing index settings
georgewallace Oct 29, 2024
f200b99
multiple updates
georgewallace Oct 30, 2024
380bffd
copy edit updates from liam
georgewallace Oct 30, 2024
e7e356b
updates
georgewallace Oct 30, 2024
19a5723
fixing cloud architecture
georgewallace Oct 30, 2024
156e627
addressing Liams feedback
georgewallace Oct 30, 2024
9fefada
removing extra ref archs
georgewallace Nov 1, 2024
8fa0a4a
putting images back
georgewallace Nov 1, 2024
bcf58ef
fixing index
georgewallace Nov 1, 2024
bb9e29a
updates per brads feedback
georgewallace Nov 1, 2024
fdf514e
updates for brad
georgewallace Nov 1, 2024
93b38e2
updating rachitecture descriptions
georgewallace Nov 1, 2024
fe34296
updates
georgewallace Nov 5, 2024
e3b4fc3
updates
georgewallace Nov 8, 2024
61bafef
updates
georgewallace Nov 8, 2024
a15bac4
fixing issue with build
georgewallace Nov 8, 2024
009b89b
correcting ids
georgewallace Nov 11, 2024
f414e57
updates
georgewallace Nov 11, 2024
1f0b2bf
updates
georgewallace Nov 11, 2024
81c5ca7
updating imageS
georgewallace Nov 11, 2024
13f7eb7
updates
georgewallace Nov 11, 2024
5438d06
updates
georgewallace Nov 12, 2024
a5f4ca3
updates
georgewallace Nov 13, 2024
113d1ec
changes:
georgewallace Nov 13, 2024
c7fe1ce
removing index to make it just show architectures
georgewallace Nov 13, 2024
8a899aa
fixing
georgewallace Nov 13, 2024
cb11b06
updates
georgewallace Nov 13, 2024
2d58722
fixing hot image
georgewallace Nov 13, 2024
a51630d
adding coming soon
georgewallace Nov 13, 2024
00f5133
updates based on feedback
georgewallace Nov 22, 2024
8c4cfd4
updates
georgewallace Nov 25, 2024
f313798
updateS
georgewallace Dec 3, 2024
80e9e22
updateS
georgewallace Dec 3, 2024
c286cbd
Apply suggestions from code review
georgewallace Dec 4, 2024
4b92e1b
Apply suggestions from code review
georgewallace Dec 4, 2024
5269683
Apply suggestions from code review
georgewallace Dec 4, 2024
94b4c7a
Apply suggestions from code review
georgewallace Dec 4, 2024
658ad64
Apply suggestions from code review
georgewallace Dec 4, 2024
940b632
Update docs/reference/reference-architectures/hot-frozen.asciidoc
georgewallace Dec 4, 2024
2924efb
correcting typo
georgewallace Dec 6, 2024
e9c6794
updates
georgewallace Dec 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/reference/index.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,8 @@ include::snapshot-restore/index.asciidoc[]

// reference

include::reference-architectures/index.asciidoc[]

include::rest-api/index.asciidoc[]

include::commands/index.asciidoc[]
Expand Down
152 changes: 152 additions & 0 deletions docs/reference/reference-architectures/hot-frozen.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
[[hot-frozen-architecture]]
== Hot / Frozen - High Availability

The Hot / Frozen high availability architecture is cost optimized for large time-series datasets. In this architecture, the hot tier is primarily used for indexing, searching, and continuity for automated processes. https://www.elastic.co/guide/en/elasticsearch/reference/current/searchable-snapshots.html[Searchable snapshots] are taken from hot into a repository, such as a cloud Object Store or an on-premesis shared filesystem, and then cached to any desired volume on the local disks of the Frozen tier. Data in the repository is indexed for fast retrieval and accessed on-demand from the Frozen nodes. Index and Snapshot lifecycle management are used to automate this process.
Copy link
Contributor

@kilfoyle kilfoyle Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Hot / Frozen high availability architecture is cost optimized for large time-series datasets. In this architecture, the hot tier is primarily used for indexing, searching, and continuity for automated processes. https://www.elastic.co/guide/en/elasticsearch/reference/current/searchable-snapshots.html[Searchable snapshots] are taken from hot into a repository, such as a cloud Object Store or an on-premesis shared filesystem, and then cached to any desired volume on the local disks of the Frozen tier. Data in the repository is indexed for fast retrieval and accessed on-demand from the Frozen nodes. Index and Snapshot lifecycle management are used to automate this process.
The Hot / Frozen High Availability architecture is cost optimized for large time-series datasets. In this architecture, the hot tier is primarily used for indexing, searching, and continuity for automated processes. https://www.elastic.co/guide/en/elasticsearch/reference/current/searchable-snapshots.html[Searchable snapshots] are taken from hot into a repository, such as a cloud object store or an on-premise shared filesystem, and then cached to any desired volume on the local disks of the frozen tier. Data in the repository is indexed for fast retrieval and accessed on-demand from the frozen nodes. Index and snapshot lifecycle management are used to automate this process.

I checked the Elasticsearch docs and it seems we don't capitalize the data tier types nor ILM/SLM.

I would capitalize "Hot / Frozen High Availability" for consistency with line 12.


This architecture is ideal for time-series use cases, such as Observability or Security, that do not require updating. All the necessary components of the Elastic Stack are included and this is not intended for sizing workloads, but rather as a basis to ensure your cluster is ready to handle any desired workload with resiliency. A very high level representation of data flow is included, and for more detail around ingest architecture see our https://www.elastic.co/guide/en/ingest/current/use-case-arch.html[ingest architectures] documentation.
Copy link
Contributor

@kilfoyle kilfoyle Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This architecture is ideal for time-series use cases, such as Observability or Security, that do not require updating. All the necessary components of the Elastic Stack are included and this is not intended for sizing workloads, but rather as a basis to ensure your cluster is ready to handle any desired workload with resiliency. A very high level representation of data flow is included, and for more detail around ingest architecture see our https://www.elastic.co/guide/en/ingest/current/use-case-arch.html[ingest architectures] documentation.
This architecture is ideal for time-series use cases, such as Observability or Security, that do not require updating. All the necessary components of the {stack} are included and this is not intended for sizing workloads, but rather as a basis to ensure your cluster is ready to handle any desired workload with resiliency. A very high level representation of data flow is included, and for more detail around ingest architecture see our {ingest-guide}/use-case-arch.html[ingest architectures] documentation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GLOBAL: {ingest-guide}/use-case-arch.html[ingest architectures] documentation.


[discrete]
[[hot-frozen-use-case]]
=== Use case

This Hot / Frozen – High Availability architecture is intended for organizations that:

* Have a requirement for cost effective long term data storage (many months or years)
* Provide insights and alerts using logs, metrics, traces, or various event types to ensure optimal performance and quick issue resolution for applications.
* Apply https://www.elastic.co/guide/en/kibana/current/xpack-ml-anomalies.html[machine learning anomaly detection] to help detect patterns in time series data to find root cause and resolve problems faster.
* Use an AI assistant (https://www.elastic.co/guide/en/observability/current/obs-ai-assistant.html[Observability], https://www.elastic.co/guide/en/security/current/security-assistant.html[Security], or https://www.elastic.co/guide/en/kibana/current/playground.html[Playground]) for investigation, incident response, reporting, query generation, or query conversion from other languages using natural language.
* Deploy an architecture model that allows for maximum flexibility between storage cost and performance.

[IMPORTANT]
====
**Automated operations that frequently read large data volumes require both high availability (replicas) and predictable low latency (hot, warm or cold tier).**

* Common examples of these tasks include look-back windows on security detection/alert rules, transforms, machine learning jobs, or watches; and long running scroll queries or external extract processes.
* These operations should be completed before moving the data into a frozen tier.
====

[discrete]
[[hot-frozen-architecture-diagram]]
=== Architecture

image::images/hot-frozen.png["A Hot/Frozen Highly available architecture"]

TIP: We use an Availability Zone (AZ) concept in the architecture above. When running in your own Data Center (DC) you can equate AZs to failure zones within a datacenter, racks, or even separate physical machines depending on your constraints.

The diagram illustrates an Elasticsearch cluster deployed across 3 availability zones (AZ). For production we recommend a minimum of 2 availability zones and 3 availability zones for mission critical applications. See https://www.elastic.co/guide/en/cloud/current/ec-planning.html[Plan for Production] for more details. A cluster that is running in Elastic Cloud that has data nodes in only two AZs will create a third master-eligible node in a third AZ. True real-time high availability cannot be achieved without three zones for any distributed computing technology.

The number of data nodes shown for each tier (hot and frozen) is illustrative and would be scaled up depending on ingest volume and retention period. Hot nodes contain both primary and replica shards. By default, primary and replica shards are always guaranteed to be in different availability zones in Elasticsearch Service, but when self-deploying https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-cluster.html#shard-allocation-awareness[shard allocation awareness] would need to be configured. Frozen nodes act as a large high-speed cache and retrieve data from the snapshot store as needed.

Machine learning nodes are optional but highly recommended for large scale time series use cases since the amount of data quickly becomes too difficult to analyze. Applying techniques such as machine learning based anomaly detection or Search AI with large language models helps to dramatically speed up problem identification and resolution.

[discrete]
[[hot-frozen-hardware]]
=== Recommended Hardware Specifications

Elastic Cloud allows you to deploy clusters in AWS, Azure and Google Cloud. Available hardware types and configurations vary across all three cloud providers but each provides instance types that meet our recommendations for the node types used in this architecture. For more details on these instance types, see our documentation on Elastic Cloud hardware for https://www.elastic.co/guide/en/cloud/current/ec-default-aws-configurations.html[AWS], https://www.elastic.co/guide/en/cloud/current/ec-default-azure-configurations.html[Azure], and https://www.elastic.co/guide/en/cloud/current/ec-default-gcp-configurations.html[GCP]. The **Physical** column below is guidance, based on the cloud node types, when self-deploying Elasticsearch in your own data center.

In the links provided above, elastic has performance tested hardware for each of the cloud providers to find the optimal hardware for each node type. We use ratios to represent the best mix of CPU, Ram, and Disk for each type. In some cases the CPU to RAM ratio is key, in others the disk to memory ratio and type of disk is critical. Significantly deviating from these ratios may look like a way to save on hardware costs, but may result in an Elasticsearch cluster that does not scale and perform well.

The following table shows our specific recommendations for nodes in Hot / Frozen architecture.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following table shows our specific recommendations for nodes in Hot / Frozen architecture.
The following table shows our specific recommendations for nodes in a Hot / Frozen architecture.


|===
| **Type** | **AWS** | **Azure** | **GCP** | **Physical**
|image:images/hot.png["Hot data node"] |
c6gd |
f32sv2|


N2|
16-32 vCPU +
64 GB RAM +
2-6 TB NVMe SSD

|image:images/frozen.png["Frozen data node"]
|
i3en
|
e8dsv4
|
N2|
8 vCPU +
64 GB RAM +
6-20+ TB NVMe SSD +
Depending on days cached
|image:images/machine-learning.png["Machine learning node"]
|
m6gd
|
f16sv2
|
N2|
16 vCPU +
64 GB RAM +
256 GB SSD
|image:images/master.png["Master node"]
|
c5d
|
f16sv2
|
N2|
8 vCPU +
16 GB RAM +
256 GB SSD
|image:images/kibana.png["Kibana node"]
|
c6gd
|
f16sv2
|
N2|
8-16 vCPU +
8 GB RAM +
256 GB SSD
|===

[discrete]
[[hot-frozen-considerations]]
=== Important considerations


**Updating Data:**

* Typically, time series logging use cases are append-only and there is rarely a need to update documents. The frozen tier is read-only.

**Multi-AZ Frozen Tier:**

* Three availability zones is ideal, but at least two availability zones are recommended to ensure that there will be data nodes available in the event of an AZ failure.

**Shard Management: **

* The most important foundational step to maintaining performance as you scale is proper shard management. This includes even shard distribution amongst nodes, shard size, and shard count. For a complete understanding of what shards are and how they should be used please review this documentation on https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html[sizing your shards].

**Snapshots:**

* If auditable or business critical events are being logged a backup is necessary. The choice to backup data will be up to each business's needs and requirements. Please see this documentation if you need to create a https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshots-register-repository.html[snapshot repository].
* To automate snapshots and attach to Index Lifecycle Management policies see https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshots-take-snapshot.html#automate-snapshots-slm[SLM (Snapshot Lifecycle Management)]

**Kibana:**

* If self-deploying outside of Elasticsearch Service ensure Kibana is configured for https://www.elastic.co/guide/en/kibana/current/production.html#high-availability[high availability].

[discrete]
[[hot-frozen-estimate]]
=== How many nodes of each do you need?
It depends on:

* The type of data being ingested (such as logs, metrics, traces)
* The retention of searchable data (such as 30 days, 90 days, 1 year)
* The amount of data you need to ingest each day.

You can https://www.elastic.co/contact[contact us] for an estimate and recommended configuration based on your specific scenario.

[discrete]
[[hot-frozen-resources]]
=== Resources and references

* https://www.elastic.co/guide/en/elasticsearch/reference/current/scalability.html[Elasticsearch - Get ready for production]

* https://www.elastic.co/guide/en/cloud/current/ec-prepare-production.html[Elastic Cloud - Preparing a deployment for production]

* https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html[Size your shards]
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
37 changes: 37 additions & 0 deletions docs/reference/reference-architectures/index.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
[[reference-architectures]]
= Reference architectures

Elasticsearch reference architectures are blueprints for deploying Elasticsearch clusters tailored to different use cases. Whether you're handling logs or metrics these reference architectureßs focus on scalability, reliability, and efficient resource utilization. Use these guidelines to deploy Elasticsearch for your use case.

These architectures are designed by architects and engineers to provide standardized, proven solutions that help users follow best practices when deploying Elasticsearch. Some of the key areas of focus are listed below.

* High availability
* Scalability

TIP: These architectures are specific to running your deployment on-premises or cloud. If you are using Elastic serverless your Elasticsearch clusters are autoscaled and fully managed by Elastic. For all the deployment options, see https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-intro-deploy.html[Run Elasticsearch].

These reference architectures are recommendations and should be adapted to fit your specific environment and needs. Each solution can vary based on the unique requirements and conditions of your deployment. In these architectures we discuss about how to deploy cluster components. For information about designing ingest architectures to feed content into your cluster, refer to https://www.elastic.co/guide/en/ingest/current/use-case-arch.html[Ingest architectures]

[discrete]
[[reference-architectures-time-series-2]]
=== Architectures

[cols="50, 50"]
|===
| *Architecture* | *When to use*
| <<hot-frozen-architecture>>

A high availability architecture that is cost optimized for large time-series datasets.

a|
* Have a requirement for cost effective long term data storage (many months or years)
* Provide insights and alerts using logs, metrics, traces, or various event types to ensure optimal performance and quick issue resolution for applications.
* Apply Machine Learning and Search AI to assist in dealing with the large amount of data.
* Deploy an architecture model that allows for maximum flexibility between storage cost and performance.
| Additional architectures are on the way.

Stay tuned for updates. |

|===

include::hot-frozen.asciidoc[]