You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/use-cases/observability/clickstack/migration/elastic/concepts.md
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -141,7 +141,7 @@ Index routing in Elasticsearch ensures specific documents are always routed to s
141
141
142
142
While both systems support the aggregation of data, ClickHouse offers significantly [more functions](/sql-reference/aggregate-functions/reference), including statistical, approximate, and specialized analytical functions.
143
143
144
-
In observability use cases, one of the most common applications for aggregations is to count how often specific log messages or events occur (and alerting in case the frequency is unusual).
144
+
In observability use cases, one of the most common applications for aggregations is to count how often specific log messages or events occur (and alert in case the frequency is unusual).
145
145
146
146
The equivalent to a ClickHouse `SELECT count(*) FROM ... GROUP BY ...` SQL query in Elasticsearch is the [terms aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html), which is an Elasticsearch [bucket aggregation](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket.html).
147
147
@@ -166,9 +166,9 @@ ClickHouse was designed to maximize efficiency on modern hardware. By default, C
166
166
On a single node, execution lanes split data into independent ranges allowing concurrent processing across CPU threads. This includes filtering, aggregation, and sorting. The local results from each lane are eventually merged, and a limit operator is applied, in case the query features a limit clause.
167
167
168
168
Query execution is further parallelized by:
169
-
1.**SIMD vectorization**: Operations on columnar data use [CPU SIMD instructions](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) (e.g., [AVX512](https://en.wikipedia.org/wiki/AVX-512)), allowing batch processing of values.
170
-
2.**Cluster-level parallelism**: In distributed setups, each node performs query processing locally. [Partial aggregation states](https://clickhouse.com/blog/aggregate-functions-combinators-in-clickhouse-for-arrays-maps-and-states#working-with-aggregation-states) are streamed to the initiating node and merged. If the query's `GROUP BY` keys align with the [sharding keys](/architecture/horizontal-scaling#shard), merging can be [minimized or avoided entirely](/operations/settings/settings#distributed_group_by_no_merge).
171
-
169
+
1.**SIMD vectorization**: operations on columnar data use [CPU SIMD instructions](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) (e.g., [AVX512](https://en.wikipedia.org/wiki/AVX-512)), allowing batch processing of values.
170
+
2.**Cluster-level parallelism**: in distributed setups, each node performs query processing locally. [Partial aggregation states](https://clickhouse.com/blog/aggregate-functions-combinators-in-clickhouse-for-arrays-maps-and-states#working-with-aggregation-states) are streamed to the initiating node and merged. If the query's `GROUP BY` keys align with the [sharding keys](/architecture/horizontal-scaling#shard), merging can be [minimized or avoided entirely](/operations/settings/settings#distributed_group_by_no_merge).
171
+
<br/>
172
172
This model enables efficient scaling across cores and nodes, making ClickHouse well-suited for large-scale analytics. The use of *partial aggregation states* allows intermediate results from different threads and nodes to be merged without loss of accuracy.
173
173
174
174
Elasticsearch, by contrast, assigns one thread per shard for most aggregations, regardless of how many CPU cores are available. These threads return shard-local top-N results, which are merged at the coordinating node. This approach can underutilize system resources and introduce potential inaccuracies in global aggregations, particularly when frequent terms are distributed across multiple shards. Accuracy can be improved by increasing the `shard_size` parameter, but this comes at the cost of higher memory usage and query latency.
@@ -219,17 +219,17 @@ We sketch the mechanics of incremental materialized views abstractly (note that
219
219
220
220
In the diagram above, the materialized view's source table already contains a data part storing some `blue` rows (1 to 10) belonging to the same group. For this group, there also already exists a data part in the view's target table storing a [partial aggregation state](https://www.youtube.com/watch?v=QDAJTKZT8y4) for the `blue` group. When ① ② ③ inserts into the source table with new rows take place, a corresponding source table data part is created for each insert, and, in parallel, (just) for each block of newly inserted rows, a partial aggregation state is calculated and inserted in the form of a data part into the materialized view's target table. ④ During background part merges, the partial aggregation states are merged, resulting in incremental data aggregation.
221
221
222
-
Note that all [over 90 aggregate functions](https://clickhouse.com/docs/en/sql-reference/aggregate-functions/reference), including their combination with aggregate function [combinators](https://www.youtube.com/watch?v=7ApwD0cfAFI), support [partial aggregation states](https://clickhouse.com/docs/en/sql-reference/data-types/aggregatefunction).
222
+
Note that all [aggregate functions](https://clickhouse.com/docs/en/sql-reference/aggregate-functions/reference) (over 90 of them), including their combinations with aggregate function [combinators](https://www.youtube.com/watch?v=7ApwD0cfAFI), support [partial aggregation states](https://clickhouse.com/docs/en/sql-reference/data-types/aggregatefunction).
223
223
224
-
You can see a more concrete example of Elasticsearch vs ClickHouse for incremental aggregates, see [here](https://github.com/ClickHouse/examples/tree/main/blog-examples/clickhouse-vs-elasticsearch/continuous-data-transformation#continuous-data-transformation-example).
224
+
For a more concrete example of Elasticsearch vs ClickHouse for incremental aggregates, see this [example](https://github.com/ClickHouse/examples/tree/main/blog-examples/clickhouse-vs-elasticsearch/continuous-data-transformation#continuous-data-transformation-example).
225
225
226
226
The advantages of ClickHouse's approach include:
227
227
228
-
-**Always-up-to-date aggregates**: Materialized views are always in sync with the source table.
229
-
-**No background jobs**: Aggregation is pushed to insert time rather than query time.
230
-
-**Better real-time performance**: Ideal for observability workloads and real-time analytics where fresh aggregates are required instantly.
231
-
-**Composable**: Materialized views can be layered or joined with other views and tables for more complex query acceleration strategies.
232
-
-**Different TTLs**: Different TTL settings can be applied to the source table and target table of the materialized view.
228
+
-**Always-up-to-date aggregates**: materialized views are always in sync with the source table.
229
+
-**No background jobs**: aggregations are pushed to insert time rather than query time.
230
+
-**Better real-time performance**: ideal for observability workloads and real-time analytics where fresh aggregates are required instantly.
231
+
-**Composable**: materialized views can be layered or joined with other views and tables for more complex query acceleration strategies.
232
+
-**Different TTLs**: different TTL settings can be applied to the source table and target table of the materialized view.
233
233
234
234
This model is particularly powerful for observability use cases where users need to compute metrics such as per-minute error rates, latencies, or top-N breakdowns without scanning billions of raw records per query.
235
235
@@ -241,10 +241,10 @@ In contrast, Elasticsearch is tightly coupled to its internal data format and Lu
241
241
242
242
ClickHouse's lakehouse capabilities extend beyond just reading data:
243
243
244
-
-**Data Catalog Integration**: ClickHouse supports integration with data catalogs like [AWS Glue](/use-cases/data-lake/glue-catalog), enabling automatic discovery and access to tables in object storage.
245
-
-**Object Storage Support**: Native support for querying data residing in [S3](/engines/table-engines/integrations/s3), [GCS](/sql-reference/table-functions/gcs), and [Azure Blob Storage](/engines/table-engines/integrations/azureBlobStorage) without requiring data movement.
246
-
-**Query Federation**: Ability to correlate data across multiple sources, including lakehouse tables, traditional databases, and ClickHouse tables using [external dictionaries](/dictionary) and [table functions](/sql-reference/table-functions).
247
-
-**Incremental Loading**: Support for continuous loading from lakehouse tables into local [MergeTree](/engines/table-engines/mergetree-family/mergetree) tables, using features like [S3Queue](/engines/table-engines/integrations/s3queue) and [ClickPipes](/integrations/clickpipes).
248
-
-**Performance Optimization**: Distributed query execution over lakehouse data using [cluster functions](/sql-reference/table-functions/cluster) for improved performance.
244
+
-**Data catalog integration**: ClickHouse supports integration with data catalogs like [AWS Glue](/use-cases/data-lake/glue-catalog), enabling automatic discovery and access to tables in object storage.
245
+
-**Object storage support**: native support for querying data residing in [S3](/engines/table-engines/integrations/s3), [GCS](/sql-reference/table-functions/gcs), and [Azure Blob Storage](/engines/table-engines/integrations/azureBlobStorage) without requiring data movement.
246
+
-**Query federation**: the ability to correlate data across multiple sources, including lakehouse tables, traditional databases, and ClickHouse tables using [external dictionaries](/dictionary) and [table functions](/sql-reference/table-functions).
247
+
-**Incremental loading**: support for continuous loading from lakehouse tables into local [MergeTree](/engines/table-engines/mergetree-family/mergetree) tables, using features like [S3Queue](/engines/table-engines/integrations/s3queue) and [ClickPipes](/integrations/clickpipes).
248
+
-**Performance optimization**: distributed query execution over lakehouse data using [cluster functions](/sql-reference/table-functions/cluster) for improved performance.
249
249
250
250
These capabilities make ClickHouse a natural fit for organizations adopting lakehouse architectures, allowing them to leverage both the flexibility of data lakes and the performance of a columnar database.
0 commit comments