Skip to content

Commit 4e9137c

Browse files
stefnestorshainaraskasdakrone
authored
(Doc+) Flush out Data Tiers (#107981) (#111074)
I highly value the content on this [Data Tiers](https://www.elastic.co/guide/en/elasticsearch/reference/current/data-tiers.html) page. Thanks for writing it! In my experience, some users may become slightly confused by its golden nuggets due to its brevity. This PR attempts to flush out common questions while remaining concise. The main changes are in the first and second-to-last sections; however, I do attempt some heading restructuring to make the TOC idea-groupings more clear for easier scan-throughs. The specific clarifications I'd like to push in order of appearance: - There's content tier (for "data category" > "content" as we've dubbed it on the higher page) and the data temperature tiers (for time series). That the temperature tiers group together is technically not stated so users end up asking about when they'd go hot>warm vs content>warm, etc. I suspect this confusion is only because users come straight to this page instead of starting at the hierarchy-parent page so have linked up. - (Main) Frozen being accessed/searched "rarely" should imply, well rarely. I wrote 1% in the PR `[TIP]` guideline section as a discussion starting point. Frequently we see users not understanding either that they actually have been or that they shouldn't have ≥25% of all searches hitting frozen tier. This comes up because of architecture bugs (e.g. frozen indices with future timestamps) but also just happenstance (e.g. 01605242 where of searches they hit majority hot, ~5% cold, but then again hit 75% frozen). - There's a slew of "how do I check that?", "how do I change that (at creation/later)?", "what if I set it null?" questions we get about `_tier_preference` so just extended the existing section already about it. --------- Co-authored-by: shainaraskas <[email protected]> Co-authored-by: Lee Hinman <[email protected]>
1 parent cd77673 commit 4e9137c

File tree

1 file changed

+93
-30
lines changed

1 file changed

+93
-30
lines changed

docs/reference/datatiers.asciidoc

Lines changed: 93 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -2,43 +2,65 @@
22
[[data-tiers]]
33
== Data tiers
44

5-
A _data tier_ is a collection of nodes with the same data role that
6-
typically share the same hardware profile:
5+
A _data tier_ is a collection of <<modules-node,nodes>> within a cluster that share the same
6+
<<node-roles,data node role>>, and a hardware profile that's appropriately sized for the role. Elastic recommends that nodes in the same tier share the same
7+
hardware profile to avoid <<hotspotting,hot spotting>>.
78

8-
* <<content-tier, Content tier>> nodes handle the indexing and query load for content such as a product catalog.
9-
* <<hot-tier, Hot tier>> nodes handle the indexing load for time series data such as logs or metrics
10-
and hold your most recent, most-frequently-accessed data.
11-
* <<warm-tier, Warm tier>> nodes hold time series data that is accessed less-frequently
9+
The data tiers that you use, and the way that you use them, depends on the data's <<data-management,category>>.
10+
11+
The following data tiers are can be used with each data category:
12+
13+
Content data:
14+
15+
* <<content-tier,Content tier>> nodes handle the indexing and query load for non-timeseries
16+
indices, such as a product catalog.
17+
18+
Time series data:
19+
20+
* <<hot-tier,Hot tier>> nodes handle the indexing load for time series data,
21+
such as logs or metrics. They hold your most recent, most-frequently-accessed data.
22+
* <<warm-tier,Warm tier>> nodes hold time series data that is accessed less-frequently
1223
and rarely needs to be updated.
1324
* <<cold-tier,Cold tier>> nodes hold time series data that is accessed
1425
infrequently and not normally updated. To save space, you can keep
1526
<<fully-mounted,fully mounted indices>> of
1627
<<ilm-searchable-snapshot,{search-snaps}>> on the cold tier. These fully mounted
1728
indices eliminate the need for replicas, reducing required disk space by
1829
approximately 50% compared to the regular indices.
19-
* <<frozen-tier, Frozen tier>> nodes hold time series data that is accessed
30+
* <<frozen-tier,Frozen tier>> nodes hold time series data that is accessed
2031
rarely and never updated. The frozen tier stores <<partially-mounted,partially
2132
mounted indices>> of <<ilm-searchable-snapshot,{search-snaps}>> exclusively.
2233
This extends the storage capacity even further — by up to 20 times compared to
2334
the warm tier.
2435

25-
TIP: The performance of an {es} node is often limited by the performance of the underlying storage.
36+
TIP: The performance of an {es} node is often limited by the performance of the underlying storage and hardware profile.
37+
For example hardware profiles, refer to Elastic Cloud's {cloud}/ec-reference-hardware.html[instance configurations].
2638
Review our recommendations for optimizing your storage for <<indexing-use-faster-hardware,indexing>> and <<search-use-faster-hardware,search>>.
2739

2840
IMPORTANT: {es} generally expects nodes within a data tier to share the same
2941
hardware profile. Variations not following this recommendation should be
3042
carefully architected to avoid <<hotspotting,hot spotting>>.
3143

32-
When you index documents directly to a specific index, they remain on content tier nodes indefinitely.
44+
The way data tiers are used often depends on the data's category:
45+
46+
- Content data remains on the <<content-tier,content tier>> for its entire
47+
data lifecycle.
3348

34-
When you index documents to a data stream, they initially reside on hot tier nodes.
35-
You can configure <<index-lifecycle-management, {ilm}>> ({ilm-init}) policies
36-
to automatically transition your time series data through the hot, warm, and cold tiers
37-
according to your performance, resiliency and data retention requirements.
49+
- Time series data may progress through the
50+
descending temperature data tiers (hot, warm, cold, and frozen) according to your
51+
performance, resiliency, and data retention requirements.
52+
+
53+
You can automate these lifecycle transitions using the <<data-streams,data stream lifecycle>>, or custom <<index-lifecycle-management,{ilm}>>.
54+
55+
[discrete]
56+
[[available-tier]]
57+
=== Available data tiers
58+
59+
Learn more about each data tier, including when and how it should be used.
3860

3961
[discrete]
4062
[[content-tier]]
41-
=== Content tier
63+
==== Content tier
4264

4365
// tag::content-tier[]
4466
Data stored in the content tier is generally a collection of items such as a product catalog or article archive.
@@ -53,13 +75,14 @@ While they are also responsible for indexing, content data is generally not inge
5375
as time series data such as logs and metrics. From a resiliency perspective the indices in this
5476
tier should be configured to use one or more replicas.
5577

56-
The content tier is required. System indices and other indices that aren't part
57-
of a data stream are automatically allocated to the content tier.
78+
The content tier is required and is often deployed within the same node
79+
grouping as the hot tier. System indices and other indices that aren't part
80+
of a data stream are automatically allocated to the content tier.
5881
// end::content-tier[]
5982

6083
[discrete]
6184
[[hot-tier]]
62-
=== Hot tier
85+
==== Hot tier
6386

6487
// tag::hot-tier[]
6588
The hot tier is the {es} entry point for time series data and holds your most-recent,
@@ -74,7 +97,7 @@ data stream>> are automatically allocated to the hot tier.
7497

7598
[discrete]
7699
[[warm-tier]]
77-
=== Warm tier
100+
==== Warm tier
78101

79102
// tag::warm-tier[]
80103
Time series data can move to the warm tier once it is being queried less frequently
@@ -87,7 +110,7 @@ For resiliency, indices in the warm tier should be configured to use one or more
87110

88111
[discrete]
89112
[[cold-tier]]
90-
=== Cold tier
113+
==== Cold tier
91114

92115
// tag::cold-tier[]
93116
When you no longer need to search time series data regularly, it can move from
@@ -109,7 +132,7 @@ but doesn't reduce required disk space compared to the warm tier.
109132

110133
[discrete]
111134
[[frozen-tier]]
112-
=== Frozen tier
135+
==== Frozen tier
113136

114137
// tag::frozen-tier[]
115138
Once data is no longer being queried, or being queried rarely, it may move from
@@ -123,9 +146,15 @@ sometimes fetch frozen data from the snapshot repository, searches on the frozen
123146
tier are typically slower than on the cold tier.
124147
// end::frozen-tier[]
125148

149+
[discrete]
150+
[[configure-data-tiers]]
151+
=== Configure data tiers
152+
153+
Follow the instructions for your deployment type to configure data tiers.
154+
126155
[discrete]
127156
[[configure-data-tiers-cloud]]
128-
=== Configure data tiers on {ess} or {ece}
157+
==== {ess} or {ece}
129158

130159
The default configuration for an {ecloud} deployment includes a shared tier for
131160
hot and content data. This tier is required and can't be removed.
@@ -159,7 +188,7 @@ tier].
159188

160189
[discrete]
161190
[[configure-data-tiers-on-premise]]
162-
=== Configure data tiers for self-managed deployments
191+
==== Self-managed deployments
163192

164193
For self-managed deployments, each node's <<data-node,data role>> is configured
165194
in `elasticsearch.yml`. For example, the highest-performance nodes in a cluster
@@ -177,25 +206,59 @@ tier.
177206
[[data-tier-allocation]]
178207
=== Data tier index allocation
179208

180-
When you create an index, by default {es} sets
181-
<<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
209+
The <<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>> setting determines which tier the index should be allocated to.
210+
211+
When you create an index, by default {es} sets the `_tier_preference`
182212
to `data_content` to automatically allocate the index shards to the content tier.
183213

184214
When {es} creates an index as part of a <<data-streams, data stream>>,
185-
by default {es} sets
186-
<<tier-preference-allocation-filter, `index.routing.allocation.include._tier_preference`>>
215+
by default {es} sets the `_tier_preference`
187216
to `data_hot` to automatically allocate the index shards to the hot tier.
188217

189-
You can explicitly set `index.routing.allocation.include._tier_preference`
190-
to opt out of the default tier-based allocation.
218+
At the time of index creation, you can override the default setting by explicitly setting
219+
the preferred value in one of two ways:
220+
221+
- Using an <<index-templates,index template>>. Refer to <<getting-started-index-lifecycle-management,Automate rollover with ILM>> for details.
222+
- Within the <<indices-create-index,create index>> request body.
223+
224+
You can override this
225+
setting after index creation by <<indices-update-settings,updating the index setting>> to the preferred
226+
value.
227+
228+
This setting also accepts multiple tiers in order of preference. This prevents indices from remaining unallocated if no nodes are available in the preferred tier. For example, when {ilm} migrates an index to the cold phase, it sets the index `_tier_preference` to `data_cold,data_warm,data_hot`.
229+
230+
To remove the data tier preference
231+
setting, set the `_tier_preference` value to `null`. This allows the index to allocate to any data node within the cluster. Setting the `_tier_preference` to `null` does not restore the default value. Note that, in the case of managed indices, a <<ilm-migrate,migrate>> action might apply a new value in its place.
232+
233+
[discrete]
234+
[[data-tier-allocation-value]]
235+
==== Determine the current data tier preference
236+
237+
You can check an existing index's data tier preference by <<indices-get-settings,polling its
238+
settings>> for `index.routing.allocation.include._tier_preference`:
239+
240+
[source,console]
241+
--------------------------------------------------
242+
GET /my-index-000001/_settings?filter_path=*.settings.index.routing.allocation.include._tier_preference
243+
--------------------------------------------------
244+
// TEST[setup:my_index]
245+
246+
[discrete]
247+
[[data-tier-allocation-troubleshooting]]
248+
==== Troubleshooting
249+
250+
The `_tier_preference` setting might conflict with other allocation settings. This conflict might prevent the shard from allocating. A conflict might occur when a cluster has not yet been completely <<troubleshoot-migrate-to-tiers,migrated
251+
to data tiers>>.
252+
253+
This setting will not unallocate a currently allocated shard, but might prevent it from migrating from its current location to its designated data tier. To troubleshoot, call the <<cluster-allocation-explain,cluster allocation explain API>> and specify the suspected problematic shard.
191254

192255
[discrete]
193256
[[data-tier-migration]]
194-
=== Automatic data tier migration
257+
==== Automatic data tier migration
195258

196259
{ilm-init} automatically transitions managed
197260
indices through the available data tiers using the <<ilm-migrate, migrate>> action.
198261
By default, this action is automatically injected in every phase.
199-
You can explicitly specify the migrate action with `"enabled": false` to disable automatic migration,
262+
You can explicitly specify the migrate action with `"enabled": false` to <<ilm-disable-migrate-ex,disable automatic migration>>,
200263
for example, if you're using the <<ilm-allocate, allocate action>> to manually
201264
specify allocation rules.

0 commit comments

Comments
 (0)