Skip to content

Commit 7243b7b

Browse files
committed
combing through and removing unnecessary changes to formatting
1 parent 5b4abe0 commit 7243b7b

File tree

427 files changed

+13992
-12191
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

427 files changed

+13992
-12191
lines changed

docs/_snippets/_config-files.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,5 @@ When configuring ClickHouse Server by adding or editing configuration files you
33
- Add files to `/etc/clickhouse-server/config.d/` directory
44
- Add files to `/etc/clickhouse-server/users.d/` directory
55
- Leave the `/etc/clickhouse-server/config.xml` file as it is
6-
- Leave the `/etc/clickhouse-server/users.xml` file as it is
7-
:::
6+
- Leave the `/etc/clickhouse-server/users.xml` file as it is
7+
:::

docs/_snippets/_tabs.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,13 @@ import CodeBlock from '@theme/CodeBlock';
1010

1111
<Tabs groupId="deployMethod">
1212
<TabItem value="serverless" label="ClickHouse Cloud" default>
13+
1314
Cloud
15+
1416
</TabItem>
1517
<TabItem value="selfmanaged" label="Self-managed">
18+
1619
Self-managed
20+
1721
</TabItem>
1822
</Tabs>

docs/about-us/beta-and-experimental-features.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,19 +14,19 @@ Due to the uncertainty of when features are classified as generally available, w
1414

1515
The sections below explicitly describe the properties of **Beta** and **Experimental** features:
1616

17-
## Beta features {#beta-features}
17+
## Beta Features {#beta-features}
1818

1919
- Under active development to make them generally available (GA)
2020
- Main known issues can be tracked on GitHub
2121
- Functionality may change in the future
2222
- Possibly enabled in ClickHouse Cloud
2323
- The ClickHouse team supports beta features
2424

25-
You can find below the features considered Beta in ClickHouse Cloud and are available for use in your ClickHouse Cloud Services.
25+
The following features are considered Beta in ClickHouse Cloud and are available for use in ClickHouse Cloud Services, even though they may be currently under a ClickHouse SETTING named ```allow_experimental_*```:
2626

27-
Note: please be sure to be using a current version of the ClickHouse [compatibility](/operations/settings/settings#compatibility) setting to be using a recently introduced feature.
27+
Note: please be sure to be using a current version of the ClickHouse [compatibility](/operations/settings/settings#compatibility) setting to be using a recently introduced feature.
2828

29-
## Experimental features {#experimental-features}
29+
## Experimental Features {#experimental-features}
3030

3131
- May never become GA
3232
- May be removed
@@ -37,11 +37,11 @@ The sections below explicitly describe the properties of **Beta** and **Experime
3737
- May lack important functionality and documentation
3838
- Cannot be enabled in the cloud
3939

40-
Please note: no additional experimental features are allowed to be enabled in ClickHouse Cloud other than those listed above as Beta.
40+
Please note: no additional experimental features are allowed to be enabled in ClickHouse Cloud other than those listed above as Beta.
4141

42-
<!-- The inner content of the tags below are replaced at build time with a table generated from source
42+
<!-- The inner content of the tags below are replaced at build time with a table generated from source
4343
Please do not modify or remove the tags
44-
-->
44+
-->
4545

46-
<!--AUTOGENERATED_START-->
47-
<!--AUTOGENERATED_END-->
46+
<!--AUTOGENERATED_START-->
47+
<!--AUTOGENERATED_END-->

docs/about-us/cloud.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ title: 'ClickHouse Cloud'
88

99
# ClickHouse Cloud
1010

11-
ClickHouse Cloud is the cloud offering created by the original creators of the popular open-source OLAP database ClickHouse.
11+
ClickHouse Cloud is the cloud offering created by the original creators of the popular open-source OLAP database ClickHouse.
1212
You can experience ClickHouse Cloud by [starting a free trial](https://console.clickhouse.cloud/signUp).
1313

1414
## ClickHouse Cloud benefits {#clickhouse-cloud-benefits}

docs/about-us/distinctive-features.md

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -7,81 +7,81 @@ title: 'Distinctive Features of ClickHouse'
77
keywords: ['compression', 'secondary-indexes','column-oriented']
88
---
99

10-
# Distinctive features of ClickHouse
10+
# Distinctive Features of ClickHouse
1111

12-
## True column-oriented database management system {#true-column-oriented-database-management-system}
12+
## True Column-Oriented Database Management System {#true-column-oriented-database-management-system}
1313

1414
In a real column-oriented DBMS, no extra data is stored with the values. This means that constant-length values must be supported to avoid storing their length "number" next to the values. For example, a billion UInt8-type values should consume around 1 GB uncompressed, or this strongly affects the CPU use. It is essential to store data compactly (without any "garbage") even when uncompressed since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data.
1515

1616
This is in contrast to systems that can store values of different columns separately, but that cannot effectively process analytical queries due to their optimization for other scenarios, such as HBase, Bigtable, Cassandra, and Hypertable. You would get throughput around a hundred thousand rows per second in these systems, but not hundreds of millions of rows per second.
1717

1818
Finally, ClickHouse is a database management system, not a single database. It allows creating tables and databases in runtime, loading data, and running queries without reconfiguring and restarting the server.
1919

20-
## Data compression {#data-compression}
20+
## Data Compression {#data-compression}
2121

2222
Some column-oriented DBMSs do not use data compression. However, data compression plays a key role in achieving excellent performance.
2323

2424
In addition to efficient general-purpose compression codecs with different trade-offs between disk space and CPU consumption, ClickHouse provides [specialized codecs](/sql-reference/statements/create/table.md#specialized-codecs) for specific kinds of data, which allow ClickHouse to compete with and outperform more niche databases, like time-series ones.
2525

26-
## Disk storage of data {#disk-storage-of-data}
26+
## Disk Storage of Data {#disk-storage-of-data}
2727

2828
Keeping data physically sorted by primary key makes it possible to extract data based on specific values or value ranges with low latency in less than a few dozen milliseconds. Some column-oriented DBMSs, such as SAP HANA and Google PowerDrill, can only work in RAM. This approach requires allocation of a larger hardware budget than necessary for real-time analysis.
2929

3030
ClickHouse is designed to work on regular hard drives, which means the cost per GB of data storage is low, but SSD and additional RAM are also fully used if available.
3131

32-
## Parallel processing on multiple cores {#parallel-processing-on-multiple-cores}
32+
## Parallel Processing on Multiple Cores {#parallel-processing-on-multiple-cores}
3333

3434
Large queries are parallelized naturally, taking all the necessary resources available on the current server.
3535

36-
## Distributed processing on multiple servers {#distributed-processing-on-multiple-servers}
36+
## Distributed Processing on Multiple Servers {#distributed-processing-on-multiple-servers}
3737

3838
Almost none of the columnar DBMSs mentioned above have support for distributed query processing.
3939

4040
In ClickHouse, data can reside on different shards. Each shard can be a group of replicas used for fault tolerance. All shards are used to run a query in parallel, transparently for the user.
4141

42-
## SQL support {#sql-support}
42+
## SQL Support {#sql-support}
4343

4444
ClickHouse supports [SQL language](/sql-reference/) that is mostly compatible with the ANSI SQL standard.
4545

4646
Supported queries include [GROUP BY](../sql-reference/statements/select/group-by.md), [ORDER BY](../sql-reference/statements/select/order-by.md), subqueries in [FROM](../sql-reference/statements/select/from.md), [JOIN](../sql-reference/statements/select/join.md) clause, [IN](../sql-reference/operators/in.md) operator, [window functions](../sql-reference/window-functions/index.md) and scalar subqueries.
4747

4848
Correlated (dependent) subqueries are not supported at the time of writing but might become available in the future.
4949

50-
## Vector computation engine {#vector-engine}
50+
## Vector Computation Engine {#vector-engine}
5151

5252
Data is not only stored by columns but is processed by vectors (parts of columns), which allows achieving high CPU efficiency.
5353

54-
## Real-time data inserts {#real-time-data-updates}
54+
## Real-Time Data Inserts {#real-time-data-updates}
5555

5656
ClickHouse supports tables with a primary key. To quickly perform queries on the range of the primary key, the data is sorted incrementally using the merge tree. Due to this, data can continually be added to the table. No locks are taken when new data is ingested.
5757

58-
## Primary indexes {#primary-index}
58+
## Primary Indexes {#primary-index}
5959

6060
Having data physically sorted by primary key makes it possible to extract data based on specific values or value ranges with low latency in less than a few dozen milliseconds.
6161

62-
## Secondary indexes {#secondary-indexes}
62+
## Secondary Indexes {#secondary-indexes}
6363

6464
Unlike other database management systems, secondary indexes in ClickHouse do not point to specific rows or row ranges. Instead, they allow the database to know in advance that all rows in some data parts would not match the query filtering conditions and do not read them at all, thus they are called [data skipping indexes](../engines/table-engines/mergetree-family/mergetree.md#table_engine-mergetree-data_skipping-indexes).
6565

66-
## Suitable for online queries {#suitable-for-online-queries}
66+
## Suitable for Online Queries {#suitable-for-online-queries}
6767

6868
Most OLAP database management systems do not aim for online queries with sub-second latencies. In alternative systems, report building time of tens of seconds or even minutes is often considered acceptable. Sometimes it takes even more time, which forces systems to prepare reports offline (in advance or by responding with "come back later").
6969

7070
In ClickHouse "low latency" means that queries can be processed without delay and without trying to prepare an answer in advance, right at the same moment as the user interface page is loading. In other words, online.
7171

72-
## Support for approximated calculations {#support-for-approximated-calculations}
72+
## Support for Approximated Calculations {#support-for-approximated-calculations}
7373

7474
ClickHouse provides various ways to trade accuracy for performance:
7575

76-
1. Aggregate functions for approximated calculation of the number of distinct values, medians, and quantiles.
77-
2. Running a query based on a part ([SAMPLE](../sql-reference/statements/select/sample.md)) of data and getting an approximated result. In this case, proportionally less data is retrieved from the disk.
78-
3. Running an aggregation for a limited number of random keys, instead of for all keys. Under certain conditions for key distribution in the data, this provides a reasonably accurate result while using fewer resources.
76+
1. Aggregate functions for approximated calculation of the number of distinct values, medians, and quantiles.
77+
2. Running a query based on a part ([SAMPLE](../sql-reference/statements/select/sample.md)) of data and getting an approximated result. In this case, proportionally less data is retrieved from the disk.
78+
3. Running an aggregation for a limited number of random keys, instead of for all keys. Under certain conditions for key distribution in the data, this provides a reasonably accurate result while using fewer resources.
7979

80-
## Adaptive join algorithm {#adaptive-join-algorithm}
80+
## Adaptive Join Algorithm {#adaptive-join-algorithm}
8181

8282
ClickHouse adaptively chooses how to [JOIN](../sql-reference/statements/select/join.md) multiple tables, by preferring hash-join algorithm and falling back to the merge-join algorithm if there's more than one large table.
8383

84-
## Data replication and data integrity support {#data-replication-and-data-integrity-support}
84+
## Data Replication and Data Integrity Support {#data-replication-and-data-integrity-support}
8585

8686
ClickHouse uses asynchronous multi-master replication. After being written to any available replica, all the remaining replicas retrieve their copy in the background. The system maintains identical data on different replicas. Recovery after most failures is performed automatically, or semi-automatically in complex cases.
8787

@@ -91,8 +91,8 @@ For more information, see the section [Data replication](../engines/table-engine
9191

9292
ClickHouse implements user account management using SQL queries and allows for [role-based access control configuration](/guides/sre/user-management/index.md) similar to what can be found in ANSI SQL standard and popular relational database management systems.
9393

94-
## Features that can be considered disadvantages {#clickhouse-features-that-can-be-considered-disadvantages}
94+
## Features that Can Be Considered Disadvantages {#clickhouse-features-that-can-be-considered-disadvantages}
9595

96-
1. No full-fledged transactions.
97-
2. Lack of ability to modify or delete already inserted data with a high rate and low latency. There are batch deletes and updates available to clean up or modify data, for example, to comply with [GDPR](https://gdpr-info.eu).
98-
3. The sparse index makes ClickHouse not so efficient for point queries retrieving single rows by their keys.
96+
1. No full-fledged transactions.
97+
2. Lack of ability to modify or delete already inserted data with a high rate and low latency. There are batch deletes and updates available to clean up or modify data, for example, to comply with [GDPR](https://gdpr-info.eu).
98+
3. The sparse index makes ClickHouse not so efficient for point queries retrieving single rows by their keys.

docs/about-us/history.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,15 @@ keywords: ['history','development','Metrica']
77
title: 'ClickHouse History'
88
---
99

10-
# ClickHouse history {#clickhouse-history}
10+
# ClickHouse History {#clickhouse-history}
1111

1212
ClickHouse was initially developed to power [Yandex.Metrica](https://metrica.yandex.com/), [the second largest web analytics platform in the world](http://w3techs.com/technologies/overview/traffic_analysis/all), and continues to be its core component. With more than 13 trillion records in the database and more than 20 billion events daily, ClickHouse allows generating custom reports on the fly directly from non-aggregated data. This article briefly covers the goals of ClickHouse in the early stages of its development.
1313

1414
Yandex.Metrica builds customized reports on the fly based on hits and sessions, with arbitrary segments defined by the user. Doing so often requires building complex aggregates, such as the number of unique users, with new data for building reports arriving in real-time.
1515

1616
As of April 2014, Yandex.Metrica was tracking about 12 billion events (page views and clicks) daily. All these events needed to be stored, in order to build custom reports. A single query may have required scanning millions of rows within a few hundred milliseconds, or hundreds of millions of rows in just a few seconds.
1717

18-
## Usage in Yandex.Metrica and other Yandex services {#usage-in-yandex-metrica-and-other-yandex-services}
18+
## Usage in Yandex.Metrica and Other Yandex Services {#usage-in-yandex-metrica-and-other-yandex-services}
1919

2020
ClickHouse serves multiple purposes in Yandex.Metrica.
2121
Its main task is to build reports in online mode using non-aggregated data. It uses a cluster of 374 servers, which store over 20.3 trillion rows in the database. The volume of compressed data is about 2 PB, without accounting for duplicates and replicas. The volume of uncompressed data (in TSV format) would be approximately 17 PB.
@@ -28,9 +28,9 @@ ClickHouse also plays a key role in the following processes:
2828
- Running queries for debugging the Yandex.Metrica engine.
2929
- Analyzing logs from the API and the user interface.
3030

31-
Nowadays, there are a multiple dozen ClickHouse installations in other Yandex services and departments: search verticals, e-commerce, advertisement, business analytics, mobile development, personal services, and others.
31+
Nowadays, there are a multiple dozen ClickHouse installations in other Yandex services and departments: search verticals, e-commerce, advertisement, business analytics, mobile development, personal services, and others.
3232

33-
## Aggregated and non-aggregated data {#aggregated-and-non-aggregated-data}
33+
## Aggregated and Non-aggregated Data {#aggregated-and-non-aggregated-data}
3434

3535
There is a widespread opinion that to calculate statistics effectively, you must aggregate data since this reduces the volume of data.
3636

@@ -45,12 +45,12 @@ However data aggregation comes with a lot of limitations:
4545
- Users do not view all the reports we generate for them. A large portion of those calculations are useless.
4646
- The logical integrity of the data may be violated for various aggregations.
4747

48-
If we do not aggregate anything and work with non-aggregated data, this might reduce the volume of calculations.
48+
If we do not aggregate anything and work with non-aggregated data, this might reduce the volume of calculations.
4949

50-
However, with aggregation, a significant part of the work is taken offline and completed relatively calmly. In contrast, online calculations require calculating as fast as possible, since the user is waiting for the result.
50+
However, with aggregation, a significant part of the work is taken offline and completed relatively calmly. In contrast, online calculations require calculating as fast as possible, since the user is waiting for the result.
5151

52-
Yandex.Metrica has a specialized system for aggregating data called Metrage, which was used for the majority of reports.
53-
Starting in 2009, Yandex.Metrica also used a specialized OLAP database for non-aggregated data called OLAPServer, which was previously used for the report builder.
54-
OLAPServer worked well for non-aggregated data, but it had many restrictions that did not allow it to be used for all reports as desired. These included a lack of support for data types (numbers only), and the inability to incrementally update data in real-time (it could only be done by rewriting data daily). OLAPServer is not a DBMS, but a specialized DB.
52+
Yandex.Metrica has a specialized system for aggregating data called Metrage, which was used for the majority of reports.
53+
Starting in 2009, Yandex.Metrica also used a specialized OLAP database for non-aggregated data called OLAPServer, which was previously used for the report builder.
54+
OLAPServer worked well for non-aggregated data, but it had many restrictions that did not allow it to be used for all reports as desired. These included a lack of support for data types (numbers only), and the inability to incrementally update data in real-time (it could only be done by rewriting data daily). OLAPServer is not a DBMS, but a specialized DB.
5555

56-
The initial goal for ClickHouse was to remove the limitations of OLAPServer and solve the problem of working with non-aggregated data for all reports, but over the years, it has grown into a general-purpose database management system suitable for a wide range of analytical tasks.
56+
The initial goal for ClickHouse was to remove the limitations of OLAPServer and solve the problem of working with non-aggregated data for all reports, but over the years, it has grown into a general-purpose database management system suitable for a wide range of analytical tasks.

0 commit comments

Comments
 (0)