Skip to content

Commit 07dd4c1

Browse files
Apply suggestions from code review
Co-authored-by: Shaun Struwig <[email protected]>
1 parent c541f50 commit 07dd4c1

File tree

1 file changed

+1
-1
lines changed
  • docs/use-cases/observability/clickstack/migration/elastic

1 file changed

+1
-1
lines changed

docs/use-cases/observability/clickstack/migration/elastic/concepts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ Both systems offer a REST API, but ClickHouse also provides a **native protocol*
8787
The concept of sharding is fundamental to Elasticsearch's scalability model. Each ① [**index**](https://www.elastic.co/blog/what-is-an-elasticsearch-index) is broken into **shards**, each of which is a physical Lucene index stored as segments on disk. A shard can have one or more physical copies called replica shards for resilience. For scalability, shards and replicas can be distributed over several nodes. A single shard ② consists of one or more immutable segments. A segment is the basic indexing structure of Lucene, the Java library providing the indexing and search features on which Elasticsearch is based.
8888

8989
:::note Insert processing in Elasticsearch
90-
Ⓐ Newly inserted documents Ⓑ first go into an in-memory indexing buffer that is flushed by default once per second. A routing formula is used to determine the target shard for flushed documents, and a new segment is written for the shard on disk. To improve query efficiency and enable the physical deletion of deleted or updated documents, segments are continuously merged in the background into larger segments until they reach a max size of 5 GB. It is possible to force a merge into larger segments, though.
90+
Ⓐ Newly inserted documents Ⓑ first go into an in-memory indexing buffer that is flushed by default once per second. A routing formula is used to determine the target shard for flushed documents, and a new segment is written for the shard on disk. To improve query efficiency and enable the physical deletion of deleted or updated documents, segments are continuously merged in the background into larger segments until they reach a max size of 5 GB. It is, however, possible to force a merge into larger segments.
9191
:::
9292

9393
Elasticsearch recommends sizing shards to around [50 GB or 200 million documents](https://www.elastic.co/docs/deploy-manage/production-guidance/optimize-performance/size-shards) due to [JVM heap and metadata overhead](https://www.elastic.co/docs/deploy-manage/production-guidance/optimize-performance/size-shards#each-shard-has-overhead). There's also a hard limit of [2 billion documents per shard](https://www.elastic.co/docs/deploy-manage/production-guidance/optimize-performance/size-shards#troubleshooting-max-docs-limit). Elasticsearch parallelizes queries across shards, but each shard is processed using a **single thread**, making over-sharding both costly and counterproductive. This inherently makes sharding tightly coupled to scaling, with more shards (and nodes) required to scale performance.

0 commit comments

Comments
 (0)