Skip to content

Commit bb9559d

Browse files
Update docs/use-cases/observability/clickstack/migration/elastic/concepts.md
Co-authored-by: Shaun Struwig <[email protected]>
1 parent d27a80a commit bb9559d

File tree

1 file changed

+1
-1
lines changed
  • docs/use-cases/observability/clickstack/migration/elastic

1 file changed

+1
-1
lines changed

docs/use-cases/observability/clickstack/migration/elastic/concepts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ The concept of sharding is fundamental to Elasticsearch's scalability model. Eac
9090
Ⓐ Newly inserted documents Ⓑ first go into an in-memory indexing buffer that is flushed by default once per second. A routing formula is used to determine the target shard for flushed documents, and a new segment is written for the shard on disk. To improve query efficiency and enable the physical deletion of deleted or updated documents, segments are continuously merged in the background into larger segments until they reach a max size of 5 GB. It is, however, possible to force a merge into larger segments.
9191
:::
9292

93-
Elasticsearch recommends sizing shards to around [50 GB or 200 million documents](https://www.elastic.co/docs/deploy-manage/production-guidance/optimize-performance/size-shards) due to [JVM heap and metadata overhead](https://www.elastic.co/docs/deploy-manage/production-guidance/optimize-performance/size-shards#each-shard-has-overhead). There's also a hard limit of [2 billion documents per shard](https://www.elastic.co/docs/deploy-manage/production-guidance/optimize-performance/size-shards#troubleshooting-max-docs-limit). Elasticsearch parallelizes queries across shards, but each shard is processed using a **single thread**, making over-sharding both costly and counterproductive. This inherently makes sharding tightly coupled to scaling, with more shards (and nodes) required to scale performance.
93+
Elasticsearch recommends sizing shards to around [50 GB or 200 million documents](https://www.elastic.co/docs/deploy-manage/production-guidance/optimize-performance/size-shards) due to [JVM heap and metadata overhead](https://www.elastic.co/docs/deploy-manage/production-guidance/optimize-performance/size-shards#each-shard-has-overhead). There's also a hard limit of [2 billion documents per shard](https://www.elastic.co/docs/deploy-manage/production-guidance/optimize-performance/size-shards#troubleshooting-max-docs-limit). Elasticsearch parallelizes queries across shards, but each shard is processed using a **single thread**, making over-sharding both costly and counterproductive. This inherently tightly couples sharding to scaling, with more shards (and nodes) required to scale performance.
9494

9595
Elasticsearch indexes all fields into [**inverted indices**](https://www.elastic.co/docs/manage-data/data-store/index-basics) for fast search, optionally using [**doc values**](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/doc-values) for aggregations, sorting and scripted field access. Numeric and geo fields use [Block K-D trees](https://users.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf) for searches on geospatial data and numeric and date ranges.
9696

0 commit comments

Comments
 (0)