Skip to content

Commit 0e6233a

Browse files
committed
performance optimizations optimized
1 parent 880ab0e commit 0e6233a

File tree

4 files changed

+75
-16
lines changed

4 files changed

+75
-16
lines changed

deploy-manage/production-guidance/optimize-performance.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,17 +12,16 @@ applies_to:
1212

1313
# Elasticsearch performance optimizations [how-to]
1414

15+
% investigate if any of the optimizations apply to serverless also.
1516
Elasticsearch’s default settings provide a good out-of-box experience for basic operations like full text search, highlighting, aggregations, and indexing.
1617

17-
However, there are a number of optimizations you can make to improve performance for your use case.
18+
However, there are a number of optimizations you can make to improve performance for your use case. This section includes both deployment-level configuration suggestions and usage-level guidance to optimize the performance of your cluster.
1819

19-
This section provides recommendations for various use cases.
20+
Use the following topics to explore relevant strategies:
2021

2122
* [General recommendations](general-recommendations.md)
22-
* [Size your shards](optimize-performance/size-shards.md)
2323
* [Tune for indexing speed](optimize-performance/indexing-speed.md)
2424
* [Tune for search speed](optimize-performance/search-speed.md)
2525
* [Tune approximate kNN search](optimize-performance/approximate-knn-search.md)
2626
* [Tune for disk usage](optimize-performance/disk-usage.md)
27-
28-
27+
* [Size your shards](optimize-performance/size-shards.md)

deploy-manage/production-guidance/optimize-performance/disk-usage.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ applies_to:
1111

1212
# Disk usage [tune-for-disk-usage]
1313

14+
This page provides strategies to reduce the storage footprint of your Elasticsearch indices. Disk usage is influenced by field mappings, index settings, document structure, and how you manage segments and shards. Use these recommendations to improve compression, eliminate unnecessary data, and optimize storage for your specific use case.
1415

1516
## Disable the features you do not need [_disable_the_features_you_do_not_need]
1617

deploy-manage/production-guidance/optimize-performance/indexing-speed.md

Lines changed: 36 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,13 @@ applies_to:
1111

1212
# Indexing speed [tune-for-indexing-speed]
1313

14+
Elasticsearch offers a wide range of indexing performance optimizations, especially useful for high-throughput ingestion workloads. This page provides practical recommendations to help you maximize indexing speed, from bulk sizing and refresh intervals to hardware and thread management.
15+
16+
::::{note}
17+
Indexing performance is also affected by your sharding and indexing strategies. Whether you’re indexing into a single index or hundreds in parallel, and how many shards each index has, can significantly influence indexing speed.
18+
19+
Make sure to consider also your cluster’s shard count, index layout, and overall data distribution when tuning for indexing speed. Refer to [](./size-shards.md) for more details about sharing strategies and recommendations.
20+
::::
1421

1522
## Use bulk requests [_use_bulk_requests]
1623

@@ -21,11 +28,12 @@ Bulk requests will yield much better performance than single-document index requ
2128

2229
A single thread sending bulk requests is unlikely to be able to max out the indexing capacity of an Elasticsearch cluster. In order to use all resources of the cluster, you should send data from multiple threads or processes. In addition to making better use of the resources of the cluster, this should help reduce the cost of each fsync.
2330

31+
On the other hand, sending data to a single shard from too many concurrent threads or processes can overwhelm the cluster. If the indexing load exceeds what {{es}} can handle, it may become a bottleneck and start rejecting requests or slowing down overall performance.
32+
2433
Make sure to watch for `TOO_MANY_REQUESTS (429)` response codes (`EsRejectedExecutionException` with the Java client), which is the way that Elasticsearch tells you that it cannot keep up with the current indexing rate. When it happens, you should pause indexing a bit before trying again, ideally with randomized exponential backoff.
2534

2635
Similarly to sizing bulk requests, only testing can tell what the optimal number of workers is. This can be tested by progressively increasing the number of workers until either I/O or CPU is saturated on the cluster.
2736

28-
2937
## Unset or increase the refresh interval [_unset_or_increase_the_refresh_interval]
3038

3139
The operation that consists of making changes visible to search - called a [refresh](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-refresh) - is costly, and calling it often while there is ongoing indexing activity can hurt indexing speed.
@@ -45,14 +53,28 @@ If `index.refresh_interval` is configured in the index settings, it may further
4553

4654

4755
## Disable swapping [_disable_swapping_2]
48-
56+
```yaml {applies_to}
57+
deployment:
58+
self: all
59+
```
4960
You should make sure that the operating system is not swapping out the java process by [disabling swapping](../../deploy/self-managed/setup-configuration-memory.md).
5061
51-
5262
## Give memory to the filesystem cache [_give_memory_to_the_filesystem_cache]
63+
```yaml {applies_to}
64+
deployment:
65+
self: all
66+
eck: all
67+
```
5368
54-
The filesystem cache will be used in order to buffer I/O operations. You should make sure to give at least half the memory of the machine running Elasticsearch to the filesystem cache.
69+
The filesystem cache is used to buffer I/O operations and plays a critical role in {{es}} performance. You should make sure to give at least half of the system's memory to the filesystem cache.
5570
71+
By default, {{es}} automatically sets its [JVM heap size](/deploy-manage/deploy/self-managed/important-settings-configuration.md#heap-size-settings) to follow this best practice. However, in self-managed or {{eck}} deployments, you have the flexibility to allocate even more memory to the filesystem cache.
72+
73+
While the filesystem cache primarily benefits search workloads, it can also improve indexing speed in certain scenarios—especially when indexing into many shards or performing frequent segment merges that involve reading existing data.
74+
75+
::::{note}
76+
On Linux, the filesystem cache uses any memory not actively used by applications. To allocate memory to the cache, ensure that enough system memory remains available and is not consumed by {{es}} or other processes.
77+
::::
5678
5779
## Use auto-generated ids [_use_auto_generated_ids]
5880
@@ -65,8 +87,17 @@ If indexing is I/O-bound, consider increasing the size of the filesystem cache (
6587
6688
Stripe your index across multiple SSDs by configuring a RAID 0 array. Remember that it will increase the risk of failure since the failure of any one SSD destroys the index. However this is typically the right tradeoff to make: optimize single shards for maximum performance, and then add replicas across different nodes so there’s redundancy for any node failures. You can also use [snapshot and restore](../../tools/snapshot-and-restore.md) to backup the index for further insurance.
6789
90+
::::{note}
91+
In {{ech}} and {{ece}}, you can choose the underlying hardware by selecting different hardware profiles or deployment templates. Refer to [ECH>Change hardware](/deploy-manage/deploy/elastic-cloud/change-hardware.md) and [ECE deployment templates](/deploy-manage/deploy/cloud-enterprise/configure-deployment-templates.md) for more details.
92+
::::
6893
6994
### Local vs. remote storage [_local_vs_remote_storage]
95+
```yaml {applies_to}
96+
deployment:
97+
self: all
98+
eck: all
99+
ece: all
100+
```
70101
71102
Directly-attached (local) storage generally performs better than remote storage because it is simpler to configure well and avoids communications overheads.
72103
@@ -95,5 +126,4 @@ Within a single cluster, indexing and searching can compete for resources. By se
95126

96127
## Additional optimizations [_additional_optimizations]
97128

98-
Many of the strategies outlined in [*Tune for disk usage*](disk-usage.md) also provide an improvement in the speed of indexing.
99-
129+
Many of the strategies outlined in [Tune for disk usage](disk-usage.md) can also help improve indexing speed.

deploy-manage/production-guidance/optimize-performance/search-speed.md

Lines changed: 34 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,36 @@ applies_to:
1111

1212
# Search speed [tune-for-search-speed]
1313

14+
This page provides guidance on tuning {{es}} for faster search performance. While hardware and system-level settings play an important role, the structure of your documents and the design of your queries often have the biggest impact. Use these recommendations to optimize field mappings, caching behavior, and query design for high-throughput, low-latency search at scale.
15+
16+
::::{note}
17+
Search performance in {{es}} depends on a combination of factors, including how expensive individual queries are, how many searches run in parallel, the number of indices and shards involved, and the overall sharding strategy and shard size. These variables influence how the system should be tuned—for example, optimizing for a small number of complex queries differs significantly from optimizing for many lightweight, concurrent searches.
18+
Make sure to consider also your cluster’s shard count, index layout, and overall data distribution when tuning for indexing speed. Refer to [](./size-shards.md) for more details about sharing strategies and recommendations.
19+
::::
20+
1421

1522
## Give memory to the filesystem cache [_give_memory_to_the_filesystem_cache_2]
23+
```yaml {applies_to}
24+
deployment:
25+
self: all
26+
eck: all
27+
```
28+
29+
{{es}} heavily relies on the filesystem cache in order to make search fast. In general, you should make sure that at least half the available memory goes to the filesystem cache so that {{es}} can keep hot regions of the index in physical memory.
1630
17-
Elasticsearch heavily relies on the filesystem cache in order to make search fast. In general, you should make sure that at least half the available memory goes to the filesystem cache so that Elasticsearch can keep hot regions of the index in physical memory.
31+
By default, {{es}} automatically sets its [JVM heap size](/deploy-manage/deploy/self-managed/important-settings-configuration.md#heap-size-settings) to follow this best practice. However, in self-managed or {{eck}} deployments, you have the flexibility to allocate even more memory to the filesystem cache, which can lead to performance improvements depending on your workload.
1832
33+
::::{note}
34+
On Linux, the filesystem cache uses any memory not actively used by applications. To allocate memory to the cache, ensure that enough system memory remains available and is not consumed by {{es}} or other processes.
35+
::::
1936
2037
## Avoid page cache thrashing by using modest readahead values on Linux [_avoid_page_cache_thrashing_by_using_modest_readahead_values_on_linux]
38+
```yaml {applies_to}
39+
deployment:
40+
self: all
41+
eck: all
42+
ece: all
43+
```
2144
2245
Search can cause a lot of randomized read I/O. When the underlying block device has a high readahead value, there may be a lot of unnecessary read I/O done, especially when files are accessed using memory mapping (see [storage types](elasticsearch://reference/elasticsearch/index-settings/store.md#file-system)).
2346
@@ -29,16 +52,25 @@ You can check the current value in `KiB` using `lsblk -o NAME,RA,MOUNTPOINT,TYPE
2952
`blockdev` expects values in 512 byte sectors whereas `lsblk` reports values in `KiB`. As an example, to temporarily set readahead to `128KiB` for `/dev/nvme0n1`, specify `blockdev --setra 256 /dev/nvme0n1`.
3053
::::
3154

32-
55+
The disk readahead cannot be adjusted in {{ech}}, as it is controlled at Linux kernel level. However, it can be modified in {{ece}}, Kubernetes, or self-managed nodes.
3356

3457
## Use faster hardware [search-use-faster-hardware]
3558

3659
If your searches are I/O-bound, consider increasing the size of the filesystem cache (see above) or using faster storage. Each search involves a mix of sequential and random reads across multiple files, and there may be many searches running concurrently on each shard, so SSD drives tend to perform better than spinning disks.
3760

3861
If your searches are CPU-bound, consider using a larger number of faster CPUs.
3962

63+
::::{note}
64+
In {{ech}} and {{ece}}, you can choose the underlying hardware by selecting different hardware profiles or deployment templates. Refer to [ECH>Change hardware](/deploy-manage/deploy/elastic-cloud/change-hardware.md) and [ECE deployment templates](/deploy-manage/deploy/cloud-enterprise/configure-deployment-templates.md) for more details.
65+
::::
4066

4167
### Local vs. remote storage [_local_vs_remote_storage_2]
68+
```yaml {applies_to}
69+
deployment:
70+
self: all
71+
eck: all
72+
ece: all
73+
```
4274

4375
Directly-attached (local) storage generally performs better than remote storage because it is simpler to configure well and avoids communications overheads.
4476

@@ -80,7 +112,6 @@ PUT movies
80112
}
81113
```
82114

83-
84115
## Pre-index data [_pre_index_data]
85116

86117
You should leverage patterns in your queries to optimize the way data is indexed. For instance, if all your documents have a `price` field and most queries run [`range`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-range-aggregation.md) aggregations on a fixed list of ranges, you could make this aggregation faster by pre-indexing the ranges into the index and using a [`terms`](elasticsearch://reference/data-analysis/aggregations/search-aggregations-bucket-terms-aggregation.md) aggregations.
@@ -152,7 +183,6 @@ GET index/_search
152183
}
153184
```
154185

155-
156186
## Consider mapping identifiers as `keyword` [map-ids-as-keyword]
157187

158188
Not all numeric data should be mapped as a [numeric](elasticsearch://reference/elasticsearch/mapping-reference/number.md) field data type. {{es}} optimizes numeric fields, such as `integer` or `long`, for [`range`](elasticsearch://reference/query-languages/query-dsl-range-query.md) queries. However, [`keyword`](elasticsearch://reference/elasticsearch/mapping-reference/keyword.md) fields are better for [`term`](elasticsearch://reference/query-languages/query-dsl-term-query.md) and other [term-level](elasticsearch://reference/query-languages/term-level-queries.md) queries.
@@ -309,7 +339,6 @@ Loading data into the filesystem cache eagerly on too many indices or too many f
309339
::::
310340

311341

312-
313342
## Use index sorting to speed up conjunctions [_use_index_sorting_to_speed_up_conjunctions]
314343

315344
[Index sorting](elasticsearch://reference/elasticsearch/index-settings/sorting.md) can be useful in order to make conjunctions faster at the cost of slightly slower indexing. Read more about it in the [index sorting documentation](elasticsearch://reference/elasticsearch/index-settings/sorting-conjunctions.md).

0 commit comments

Comments
 (0)