Skip to content

Commit 264f5e9

Browse files
committed
101 prod exploration
1 parent f05c9b0 commit 264f5e9

12 files changed

+272
-170
lines changed

docs/reference/data-management.asciidoc

Lines changed: 17 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -4,31 +4,10 @@
44

55
[partintro]
66
--
7-
The data you store in {es} generally falls into one of two categories:
87

9-
* Content: a collection of items you want to search, such as a catalog of products
10-
* Time series data: a stream of continuously-generated timestamped data, such as log entries
8+
include::{es-ref-dir}/lifecycle-options.asciidoc[]
119

12-
Content might be frequently updated,
13-
but the value of the content remains relatively constant over time.
14-
You want to be able to retrieve items quickly regardless of how old they are.
15-
16-
Time series data keeps accumulating over time, so you need strategies for
17-
balancing the value of the data against the cost of storing it.
18-
As it ages, it tends to become less important and less-frequently accessed,
19-
so you can move it to less expensive, less performant hardware.
20-
For your oldest data, what matters is that you have access to the data.
21-
It's ok if queries take longer to complete.
22-
23-
To help you manage your data, {es} offers you:
24-
25-
* <<index-lifecycle-management, {ilm-cap}>> ({ilm-init}) to manage both indices and data streams and it is fully customisable, and
26-
* <<data-stream-lifecycle, Data stream lifecycle>> which is the built-in lifecycle of data streams and addresses the most
27-
common lifecycle management needs.
28-
29-
preview::["The built-in data stream lifecycle is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but this feature is not subject to the support SLA of official GA features."]
30-
31-
**{ilm-init}** can be used to manage both indices and data streams and it allows you to:
10+
**{ilm-init}** can be used to manage both indices and data streams. It allows you to do the following:
3211

3312
* Define the retention period of your data. The retention period is the minimum time your data will be stored in {es}.
3413
Data older than this period can be deleted by {es}.
@@ -38,12 +17,24 @@ Data older than this period can be deleted by {es}.
3817
for your older indices while reducing operating costs and maintaining search performance.
3918
* Perform <<async-search-intro, asynchronous searches>> of data stored on less-performant hardware.
4019

41-
**Data stream lifecycle** is less feature rich but is focused on simplicity, so it allows you to easily:
20+
**Data stream lifecycle** is less feature rich but is focused on simplicity. It allows you to do the following:
4221

4322
* Define the retention period of your data. The retention period is the minimum time your data will be stored in {es}.
4423
Data older than this period can be deleted by {es} at a later time.
45-
* Improve the performance of your data stream by performing background operations that will optimise the way your data
46-
stream is stored.
24+
* Improve the performance of your data stream by performing background operations that will optimize the way your data stream is stored.
25+
26+
**Elastic Curator** is a tool that allows you to manage your indices and snapshots using user-defined filters and predefined actions. If ILM provides the functionality to manage your index lifecycle, and you have at least a Basic license, consider using ILM in place of Curator. Many stack components make use of ILM by default. {curator-ref-current}/ilm.html[Learn more].
27+
28+
NOTE: <<xpack-rollup,data rollup>> is a deprecated Elasticsearch feature that allows you to manage the amount of data that is stored in your cluster, similar to the downsampling functionality of {ilm-init} and data stream lifecycle. This feature should not be used for new deployments.
29+
30+
[TIP]
31+
====
32+
{ilm-init} is not available on {es-serverless}.
33+
34+
In an {cloud} or self-managed environment, ILM lets you automatically transition indices through data tiers according to your performance needs and retention requirements. This allows you to balance hardware costs with performance. {es-serverless} eliminates this complexity by optimizing your cluster performance for you.
35+
36+
Data stream lifecycle is an optimized lifecycle tool that lets you focus on the most common lifecycle management needs, without unnecessary hardware-centric concepts like data tiers.
37+
====
4738
--
4839

4940
include::ilm/index.asciidoc[]
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
= Data store architecture
2+
3+
[partintro]
4+
--
5+
6+
{es} is a distributed document store. Instead of storing information as rows of columnar data, {es} stores complex data structures that have been serialized as JSON documents. When you have multiple {es} nodes in a cluster, stored documents are distributed across the cluster and can be accessed immediately
7+
from any node.
8+
9+
The topics in this section provides information about the architecture of {es} and how it stores and retrieves data:
10+
11+
<<nodes-shards,Nodes and shards>>: Learn about the basic building blocks of an {es} cluster, including nodes, shards, primaries, and replicas.
12+
<<docs-replication,Reading and writing documents>>: Learn how {es} replicates read and write operations across shards and shard copies.
13+
--
14+
15+
include::nodes-shards.asciidoc[]
16+
include::docs/data-replication.asciidoc[leveloffset=-1]
17+
include::modules/shard-ops.asciidoc[]

docs/reference/docs.asciidoc

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,7 @@
11
[[docs]]
22
== Document APIs
33

4-
This section starts with a short introduction to {es}'s <<docs-replication,data
5-
replication model>>, followed by a detailed description of the following CRUD
6-
APIs:
4+
This section describes the following CRUD APIs:
75

86
.Single document APIs
97
* <<docs-index_>>
@@ -18,8 +16,6 @@ APIs:
1816
* <<docs-update-by-query>>
1917
* <<docs-reindex>>
2018

21-
include::docs/data-replication.asciidoc[]
22-
2319
include::docs/index_.asciidoc[]
2420

2521
include::docs/get.asciidoc[]

docs/reference/docs/data-replication.asciidoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11

22
[[docs-replication]]
3-
=== Reading and Writing documents
3+
=== Reading and writing documents
44

55
[discrete]
66
==== Introduction

docs/reference/high-availability.asciidoc

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,5 +26,3 @@ to achieve high availability despite failures.
2626
--
2727

2828
include::high-availability/cluster-design.asciidoc[]
29-
30-
include::ccr/index.asciidoc[]

docs/reference/index.asciidoc

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -60,24 +60,30 @@ include::geospatial-analysis.asciidoc[]
6060

6161
include::watcher/index.asciidoc[]
6262

63-
// cluster management
64-
65-
include::monitoring/index.asciidoc[]
63+
// production tasks
6664

67-
include::security/index.asciidoc[]
65+
include::production.asciidoc[]
6866

69-
// production tasks
67+
include::how-to.asciidoc[]
7068

7169
include::high-availability.asciidoc[]
7270

73-
include::how-to.asciidoc[]
71+
include::snapshot-restore/index.asciidoc[]
72+
73+
include::ccr/index.asciidoc[leveloffset=-1]
7474

7575
include::autoscaling/index.asciidoc[]
7676

77-
include::snapshot-restore/index.asciidoc[]
77+
// cluster management
78+
79+
include::security/index.asciidoc[]
80+
81+
include::monitoring/index.asciidoc[]
7882

7983
// reference
8084

85+
include::data-store-architecture.asciidoc[]
86+
8187
include::rest-api/index.asciidoc[]
8288

8389
include::commands/index.asciidoc[]

docs/reference/intro.asciidoc

Lines changed: 0 additions & 118 deletions
Original file line numberDiff line numberDiff line change
@@ -375,121 +375,3 @@ Does not yet support full-text search.
375375
| N/A
376376

377377
|===
378-
379-
// New html page
380-
[[scalability]]
381-
=== Get ready for production
382-
383-
Many teams rely on {es} to run their key services. To keep these services running, you can design your {es} deployment
384-
to keep {es} available, even in case of large-scale outages. To keep it running fast, you also can design your
385-
deployment to be responsive to production workloads.
386-
387-
{es} is built to be always available and to scale with your needs. It does this using a distributed architecture.
388-
By distributing your cluster, you can keep Elastic online and responsive to requests.
389-
390-
In case of failure, {es} offers tools for cross-cluster replication and cluster snapshots that can
391-
help you fall back or recover quickly. You can also use cross-cluster replication to serve requests based on the
392-
geographic location of your users and your resources.
393-
394-
{es} also offers security and monitoring tools to help you keep your cluster highly available.
395-
396-
[discrete]
397-
[[use-multiple-nodes-shards]]
398-
==== Use multiple nodes and shards
399-
400-
[NOTE]
401-
====
402-
Nodes and shards are what make {es} distributed and scalable.
403-
404-
These concepts aren’t essential if you’re just getting started. How you <<elasticsearch-intro-deploy,deploy {es}>> in production determines what you need to know:
405-
406-
* *Self-managed {es}*: You are responsible for setting up and managing nodes, clusters, shards, and replicas. This includes
407-
managing the underlying infrastructure, scaling, and ensuring high availability through failover and backup strategies.
408-
* *Elastic Cloud*: Elastic can autoscale resources in response to workload changes. Choose from different deployment types
409-
to apply sensible defaults for your use case. A basic understanding of nodes, shards, and replicas is still important.
410-
* *Elastic Cloud Serverless*: You don’t need to worry about nodes, shards, or replicas. These resources are 100% automated
411-
on the serverless platform, which is designed to scale with your workload.
412-
====
413-
414-
You can add servers (_nodes_) to a cluster to increase capacity, and {es} automatically distributes your data and query load
415-
across all of the available nodes.
416-
417-
Elastic is able to distribute your data across nodes by subdividing an index into _shards_. Each index in {es} is a grouping
418-
of one or more physical shards, where each shard is a self-contained Lucene index containing a subset of the documents in
419-
the index. By distributing the documents in an index across multiple shards, and distributing those shards across multiple
420-
nodes, {es} increases indexing and query capacity.
421-
422-
There are two types of shards: _primaries_ and _replicas_. Each document in an index belongs to one primary shard. A replica
423-
shard is a copy of a primary shard. Replicas maintain redundant copies of your data across the nodes in your cluster.
424-
This protects against hardware failure and increases capacity to serve read requests like searching or retrieving a document.
425-
426-
[TIP]
427-
====
428-
The number of primary shards in an index is fixed at the time that an index is created, but the number of replica shards can
429-
be changed at any time, without interrupting indexing or query operations.
430-
====
431-
432-
Shard copies in your cluster are automatically balanced across nodes to provide scale and high availability. All nodes are
433-
aware of all the other nodes in the cluster and can forward client requests to the appropriate node. This allows {es}
434-
to distribute indexing and query load across the cluster.
435-
436-
If you’re exploring {es} for the first time or working in a development environment, then you can use a cluster with a single node and create indices
437-
with only one shard. However, in a production environment, you should build a cluster with multiple nodes and indices
438-
with multiple shards to increase performance and resilience.
439-
440-
// TODO - diagram
441-
442-
To learn about optimizing the number and size of shards in your cluster, refer to <<size-your-shards,Size your shards>>.
443-
To learn about how read and write operations are replicated across shards and shard copies, refer to <<docs-replication,Reading and writing documents>>.
444-
To adjust how shards are allocated and balanced across nodes, refer to <<shard-allocation-relocation-recovery,Shard allocation, relocation, and recovery>>.
445-
446-
[discrete]
447-
[[ccr-disaster-recovery-geo-proximity]]
448-
==== CCR for disaster recovery and geo-proximity
449-
450-
To effectively distribute read and write operations across nodes, the nodes in a cluster need good, reliable connections
451-
to each other. To provide better connections, you typically co-locate the nodes in the same data center or nearby data centers.
452-
453-
Co-locating nodes in a single location exposes you to the risk of a single outage taking your entire cluster offline. To
454-
maintain high availability, you can prepare a second cluster that can take over in case of disaster by implementing
455-
cross-cluster replication (CCR).
456-
457-
CCR provides a way to automatically synchronize indices from your primary cluster to a secondary remote cluster that
458-
can serve as a hot backup. If the primary cluster fails, the secondary cluster can take over.
459-
460-
You can also use CCR to create secondary clusters to serve read requests in geo-proximity to your users.
461-
462-
Learn more about <<xpack-ccr,cross-cluster replication>> and about <<high-availability-cluster-design,designing for resilience>>.
463-
464-
[TIP]
465-
====
466-
You can also take <<snapshot-restore,snapshots>> of your cluster that can be restored in case of failure.
467-
====
468-
469-
[discrete]
470-
[[security-and-monitoring]]
471-
==== Security and monitoring
472-
473-
As with any enterprise system, you need tools to secure, manage, and monitor your {es} clusters. Security,
474-
monitoring, and administrative features that are integrated into {es} enable you to use {kibana-ref}/introduction.html[Kibana] as a
475-
control center for managing a cluster.
476-
477-
<<secure-cluster,Learn about securing an {es} cluster>>.
478-
479-
<<monitor-elasticsearch-cluster,Learn about monitoring your cluster>>.
480-
481-
[discrete]
482-
[[cluster-design]]
483-
==== Cluster design
484-
485-
{es} offers many options that allow you to configure your cluster to meet your organization’s goals, requirements,
486-
and restrictions. You can review the following guides to learn how to tune your cluster to meet your needs:
487-
488-
* <<high-availability-cluster-design,Designing for resilience>>
489-
* <<tune-for-indexing-speed,Tune for indexing speed>>
490-
* <<tune-for-search-speed,Tune for search speed>>
491-
* <<tune-for-disk-usage,Tune for disk usage>>
492-
* <<use-elasticsearch-for-time-series-data,Tune for time series data>>
493-
494-
Many {es} options come with different performance considerations and trade-offs. The best way to determine the
495-
optimal configuration for your use case is through https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing[testing with your own data and queries].
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
The data you store in {es} generally falls into one of two categories:
2+
3+
* *Content*: a collection of items you want to search, such as a catalog of products
4+
* *Time series data*: a stream of continuously-generated timestamped data, such as log entries
5+
6+
*Content* might be frequently updated,
7+
but the value of the content remains relatively constant over time.
8+
You want to be able to retrieve items quickly regardless of how old they are.
9+
10+
*Time series data* keeps accumulating over time, so you need strategies for
11+
balancing the value of the data against the cost of storing it.
12+
As it ages, it tends to become less important and less-frequently accessed,
13+
so you can move it to less expensive, less performant hardware.
14+
For your oldest data, what matters is that you have access to the data.
15+
It's ok if queries take longer to complete.
16+
17+
To help you manage your data, {es} offers you the following options:
18+
19+
* <<index-lifecycle-management, {ilm-cap}>>
20+
* <<data-stream-lifecycle, Data stream lifecycle>>
21+
* {curator-ref-current}/about.html[Elastic Curator]

docs/reference/modules/shard-ops.asciidoc

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[[shard-allocation-relocation-recovery]]
2-
=== Shard allocation, relocation, and recovery
2+
== Shard allocation, relocation, and recovery
33

44
Each <<documents-indices,index>> in Elasticsearch is divided into one or more <<scalability,shards>>.
55
Each document in an index belongs to a single shard.
@@ -12,22 +12,25 @@ Over the course of normal operation, Elasticsearch allocates shard copies to nod
1212

1313
TIP: To learn about optimizing the number and size of shards in your cluster, refer to <<size-your-shards,Size your shards>>. To learn about how read and write operations are replicated across shards and shard copies, refer to <<docs-replication,Reading and writing documents>>.
1414

15+
[discrete]
1516
[[shard-allocation]]
16-
==== Shard allocation
17+
=== Shard allocation
1718

1819
include::{es-ref-dir}/modules/shard-allocation-desc.asciidoc[]
1920

2021
By default, the primary and replica shard copies for an index can be allocated to any node in the cluster, and may be relocated to rebalance the cluster.
2122

22-
===== Adjust shard allocation settings
23+
[discrete]
24+
==== Adjust shard allocation settings
2325

2426
You can control how shard copies are allocated using the following settings:
2527

2628
- <<modules-cluster,Cluster-level shard allocation settings>>: Use these settings to control how shard copies are allocated and balanced across the entire cluster. For example, you might want to allocate nodes availability zones, or prevent certain nodes from being used so you can perform maintenance.
2729

2830
- <<index-modules-allocation,Index-level shard allocation settings>>: Use these settings to control how the shard copies for a specific index are allocated. For example, you might want to allocate an index to a node in a specific data tier, or to an node with specific attributes.
2931

30-
===== Monitor shard allocation
32+
[discrete]
33+
==== Monitor shard allocation
3134

3235
If a shard copy is unassigned, it means that the shard copy is not allocated to any node in the cluster. This can happen if there are not enough nodes in the cluster to allocate the shard copy, or if the shard copy can't be allocated to any node that satisfies the shard allocation filtering rules. When a shard copy is unassigned, your cluster is considered unhealthy and returns a yellow or red cluster health status.
3336

@@ -39,12 +42,14 @@ You can use the following APIs to monitor shard allocation:
3942

4043
<<red-yellow-cluster-status,Learn more about troubleshooting unassigned shard copies and recovering your cluster health>>.
4144

45+
[discrete]
4246
[[shard-recovery]]
43-
==== Shard recovery
47+
=== Shard recovery
4448

4549
include::{es-ref-dir}/modules/shard-recovery-desc.asciidoc[]
4650

47-
===== Adjust shard recovery settings
51+
[discrete]
52+
==== Adjust shard recovery settings
4853

4954
To control how shards are recovered, for example the resources that can be used by recovery operations, and which indices should be prioritized for recovery, you can adjust the following settings:
5055

@@ -54,21 +59,24 @@ To control how shards are recovered, for example the resources that can be used
5459

5560
Shard recovery operations also respect general shard allocation settings.
5661

57-
===== Monitor shard recovery
62+
[discrete]
63+
==== Monitor shard recovery
5864

5965
You can use the following APIs to monitor shard allocation:
6066

6167
- View a list of in-progress and completed recoveries using the <<cat-recovery,cat recovery API>>
6268
- View detailed information about a specific recovery using the <<indices-recovery,index recovery API>>
6369

70+
[discrete]
6471
[[shard-relocation]]
65-
==== Shard relocation
72+
=== Shard relocation
6673

6774
Shard relocation is the process of moving shard copies from one node to another. This can happen when a node joins or leaves the cluster, or when the cluster is rebalancing.
6875

6976
When a shard copy is relocated, it is created as a new shard copy on the target node. When the shard copy is fully allocated and recovered, the old shard copy is deleted. If the shard copy being relocated is a primary, then the new shard copy is marked as primary before the old shard copy is deleted.
7077

71-
===== Adjust shard relocation settings
78+
[discrete]
79+
==== Adjust shard relocation settings
7280

7381
You can control how and when shard copies are relocated. For example, you can adjust the rebalancing settings that control when shard copies are relocated to balance the cluster, or the high watermark for disk-based shard allocation that can trigger relocation. These settings are part of the <<modules-cluster,cluster-level shard allocation settings>>.
7482

0 commit comments

Comments
 (0)