Skip to content

Commit 46aba26

Browse files
committed
additional fixes
2 parents 90332d6 + 2ca1b50 commit 46aba26

File tree

28 files changed

+96
-102
lines changed

28 files changed

+96
-102
lines changed

docs/_snippets/_S3_authentication_and_bucket.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ import s3_h from '@site/static/images/_snippets/s3/s3-h.png';
2222
This article demonstrates the basics of how to configure an AWS IAM user, create an S3 bucket and configure ClickHouse to use the bucket as an S3 disk. You should work with your security team to determine the permissions to be used, and consider these as a starting point.
2323
### Create an AWS IAM user {#create-an-aws-iam-user}
2424
In this procedure, we'll be creating a service account user, not a login user.
25-
1. Log into the AWS IAM Management Console.
25+
1. Log into the AWS IAM Management Console.
2626
2. In "users", select **Add users**
2727
<Image size="md" img={s3_1} alt="AWS IAM Management Console - Adding a new user" border force/>
2828
3. Enter the user name and set the credential type to **Access key - Programmatic access** and select **Next: Permissions**

docs/_snippets/_users-and-roles-common.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -111,49 +111,49 @@ With this set of examples:
111111

112112
Roles are used to define groups of users for certain privileges instead of managing each user separately.
113113

114-
1. Create a role to restrict users of this role to only see `column1` in database `db1` and `table1`:
114+
1. Create a role to restrict users of this role to only see `column1` in database `db1` and `table1`:
115115

116116
```sql
117117
CREATE ROLE column1_users;
118118
```
119119

120-
2. Set privileges to allow view on `column1`
120+
2. Set privileges to allow view on `column1`
121121

122122
```sql
123123
GRANT SELECT(id, column1) ON db1.table1 TO column1_users;
124124
```
125125

126-
3. Add the `column_user` user to the `column1_users` role
126+
3. Add the `column_user` user to the `column1_users` role
127127

128128
```sql
129129
GRANT column1_users TO column_user;
130130
```
131131

132-
4. Create a role to restrict users of this role to only see selected rows, in this case, only rows containing `A` in `column1`
132+
4. Create a role to restrict users of this role to only see selected rows, in this case, only rows containing `A` in `column1`
133133

134134
```sql
135135
CREATE ROLE A_rows_users;
136136
```
137137

138-
5. Add the `row_user` to the `A_rows_users` role
138+
5. Add the `row_user` to the `A_rows_users` role
139139

140140
```sql
141141
GRANT A_rows_users TO row_user;
142142
```
143143

144-
6. Create a policy to allow view on only where `column1` has the values of `A`
144+
6. Create a policy to allow view on only where `column1` has the values of `A`
145145

146146
```sql
147147
CREATE ROW POLICY A_row_filter ON db1.table1 FOR SELECT USING column1 = 'A' TO A_rows_users;
148148
```
149149

150-
7. Set privileges to the database and table
150+
7. Set privileges to the database and table
151151

152152
```sql
153153
GRANT SELECT(id, column1, column2) ON db1.table1 TO A_rows_users;
154154
```
155155

156-
8. grant explicit permissions for other roles to still have access to all rows
156+
8. grant explicit permissions for other roles to still have access to all rows
157157

158158
```sql
159159
CREATE ROW POLICY allow_other_users_filter

docs/about-us/distinctive-features.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -73,9 +73,9 @@ In ClickHouse "low latency" means that queries can be processed without delay an
7373

7474
ClickHouse provides various ways to trade accuracy for performance:
7575

76-
1. Aggregate functions for approximated calculation of the number of distinct values, medians, and quantiles.
77-
2. Running a query based on a part ([SAMPLE](../sql-reference/statements/select/sample.md)) of data and getting an approximated result. In this case, proportionally less data is retrieved from the disk.
78-
3. Running an aggregation for a limited number of random keys, instead of for all keys. Under certain conditions for key distribution in the data, this provides a reasonably accurate result while using fewer resources.
76+
1. Aggregate functions for approximated calculation of the number of distinct values, medians, and quantiles.
77+
2. Running a query based on a part ([SAMPLE](../sql-reference/statements/select/sample.md)) of data and getting an approximated result. In this case, proportionally less data is retrieved from the disk.
78+
3. Running an aggregation for a limited number of random keys, instead of for all keys. Under certain conditions for key distribution in the data, this provides a reasonably accurate result while using fewer resources.
7979

8080
## Adaptive join algorithm {#adaptive-join-algorithm}
8181

@@ -93,6 +93,6 @@ ClickHouse implements user account management using SQL queries and allows for [
9393

9494
## Features that can be considered disadvantages {#clickhouse-features-that-can-be-considered-disadvantages}
9595

96-
1. No full-fledged transactions.
97-
2. Lack of ability to modify or delete already inserted data with a high rate and low latency. There are batch deletes and updates available to clean up or modify data, for example, to comply with [GDPR](https://gdpr-info.eu).
98-
3. The sparse index makes ClickHouse not so efficient for point queries retrieving single rows by their keys.
96+
1. No full-fledged transactions.
97+
2. Lack of ability to modify or delete already inserted data with a high rate and low latency. There are batch deletes and updates available to clean up or modify data, for example, to comply with [GDPR](https://gdpr-info.eu).
98+
3. The sparse index makes ClickHouse not so efficient for point queries retrieving single rows by their keys.

docs/architecture/cluster-deployment.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,10 @@ By going through this tutorial, you'll learn how to set up a simple ClickHouse c
1414

1515
This ClickHouse cluster will be a homogeneous cluster. Here are the steps:
1616

17-
1. Install ClickHouse server on all machines of the cluster
18-
2. Set up cluster configs in configuration files
19-
3. Create local tables on each instance
20-
4. Create a [Distributed table](../engines/table-engines/special/distributed.md)
17+
1. Install ClickHouse server on all machines of the cluster
18+
2. Set up cluster configs in configuration files
19+
3. Create local tables on each instance
20+
4. Create a [Distributed table](../engines/table-engines/special/distributed.md)
2121

2222
A [distributed table](../engines/table-engines/special/distributed.md) is a kind of "view" to the local tables in a ClickHouse cluster. A SELECT query from a distributed table executes using resources of all cluster's shards. You may specify configs for multiple clusters and create multiple distributed tables to provide views for different clusters.
2323

docs/cloud/changelogs/24_02.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ sidebar_position: 8
1919
* The obsolete in-memory data parts have been deprecated since version 23.5 and have not been supported since version 23.10. Now the remaining code is removed. Continuation of [#55186](https://github.com/ClickHouse/ClickHouse/issues/55186) and [#45409](https://github.com/ClickHouse/ClickHouse/issues/45409). It is unlikely that you have used in-memory data parts because they were available only before version 23.5 and only when you enabled them manually by specifying the corresponding SETTINGS for a MergeTree table. To check if you have in-memory data parts, run the following query: `SELECT part_type, count() FROM system.parts GROUP BY part_type ORDER BY part_type`. To disable the usage of in-memory data parts, do `ALTER TABLE ... MODIFY SETTING min_bytes_for_compact_part = DEFAULT, min_rows_for_compact_part = DEFAULT`. Before upgrading from old ClickHouse releases, first check that you don't have in-memory data parts. If there are in-memory data parts, disable them first, then wait while there are no in-memory data parts and continue the upgrade. [#61127](https://github.com/ClickHouse/ClickHouse/pull/61127) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
2020
* Forbid `SimpleAggregateFunction` in `ORDER BY` of `MergeTree` tables (like `AggregateFunction` is forbidden, but they are forbidden because they are not comparable) by default (use `allow_suspicious_primary_key` to allow them). [#61399](https://github.com/ClickHouse/ClickHouse/pull/61399) ([Azat Khuzhin](https://github.com/azat)).
2121
* ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. This is controlled by the settings, `output_format_parquet_string_as_string`, `output_format_orc_string_as_string`, `output_format_arrow_string_as_string`. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases. Parquet/ORC/Arrow supports many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools lack support for the faster `lz4` compression method, that's why we set `zstd` by default. This is controlled by the settings `output_format_parquet_compression_method`, `output_format_orc_compression_method`, and `output_format_arrow_compression_method`. We changed the default to `zstd` for Parquet and ORC, but not Arrow (it is emphasized for low-level usages). [#61817](https://github.com/ClickHouse/ClickHouse/pull/61817) ([Alexey Milovidov](https://github.com/alexey-milovidov)).
22-
* Fix for the materialized view security issue, which allowed a user to insert into a table without required grants for that. Fix validates that the user has permission to insert not only into a materialized view but also into all underlying tables. This means that some queries, which worked before, now can fail with Not enough privileges. To address this problem, the release introduces a new feature of SQL security for views [https://clickhouse.com/docs/sql-reference/statements/create/view#sql_security](/sql-reference/statements/create/view#sql_security). [#54901](https://github.com/ClickHouse/ClickHouse/pull/54901) ([pufit](https://github.com/pufit))
22+
* Fix for the materialized view security issue, which allowed a user to insert into a table without required grants for that. Fix validates that the user has permission to insert not only into a materialized view but also into all underlying tables. This means that some queries, which worked before, now can fail with Not enough privileges. To address this problem, the release introduces a new feature of SQL security for views [https://clickhouse.com/docs/sql-reference/statements/create/view#sql_security](/sql-reference/statements/create/view#sql_security). [#54901](https://github.com/ClickHouse/ClickHouse/pull/54901) ([pufit](https://github.com/pufit))
2323

2424
#### New feature {#new-feature}
2525
* Topk/topkweighed support mode, which return count of values and it's error. [#54508](https://github.com/ClickHouse/ClickHouse/pull/54508) ([UnamedRus](https://github.com/UnamedRus)).

docs/faq/operations/production.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ description: 'This page provides guidance on which ClickHouse version to use in
1010

1111
First of all, let's discuss why people ask this question in the first place. There are two key reasons:
1212

13-
1. ClickHouse is developed with pretty high velocity, and usually there are 10+ stable releases per year. That makes a wide range of releases to choose from, which is not so trivial of a choice.
14-
2. Some users want to avoid spending time figuring out which version works best for their use case and just follow someone else's advice.
13+
1. ClickHouse is developed with pretty high velocity, and usually there are 10+ stable releases per year. That makes a wide range of releases to choose from, which is not so trivial of a choice.
14+
2. Some users want to avoid spending time figuring out which version works best for their use case and just follow someone else's advice.
1515

1616
The second reason is more fundamental, so we'll start with that one and then get back to navigating through various ClickHouse releases.
1717

@@ -39,19 +39,19 @@ Here are some key points to get reasonable fidelity in a pre-production environm
3939

4040
When you have your pre-production environment and testing infrastructure in place, choosing the best version is straightforward:
4141

42-
1. Routinely run your automated tests against new ClickHouse releases. You can do it even for ClickHouse releases that are marked as `testing`, but going forward to the next steps with them is not recommended.
43-
2. Deploy the ClickHouse release that passed the tests to pre-production and check that all processes are running as expected.
44-
3. Report any issues you discovered to [ClickHouse GitHub Issues](https://github.com/ClickHouse/ClickHouse/issues).
45-
4. If there were no major issues, it should be safe to start deploying ClickHouse release to your production environment. Investing in gradual release automation that implements an approach similar to [canary releases](https://martinfowler.com/bliki/CanaryRelease.html) or [green-blue deployments](https://martinfowler.com/bliki/BlueGreenDeployment.html) might further reduce the risk of issues in production.
42+
1. Routinely run your automated tests against new ClickHouse releases. You can do it even for ClickHouse releases that are marked as `testing`, but going forward to the next steps with them is not recommended.
43+
2. Deploy the ClickHouse release that passed the tests to pre-production and check that all processes are running as expected.
44+
3. Report any issues you discovered to [ClickHouse GitHub Issues](https://github.com/ClickHouse/ClickHouse/issues).
45+
4. If there were no major issues, it should be safe to start deploying ClickHouse release to your production environment. Investing in gradual release automation that implements an approach similar to [canary releases](https://martinfowler.com/bliki/CanaryRelease.html) or [green-blue deployments](https://martinfowler.com/bliki/BlueGreenDeployment.html) might further reduce the risk of issues in production.
4646

4747
As you might have noticed, there's nothing specific to ClickHouse in the approach described above - people do that for any piece of infrastructure they rely on if they take their production environment seriously.
4848

4949
## How to choose between ClickHouse releases? {#how-to-choose-between-clickhouse-releases}
5050

5151
If you look into the contents of the ClickHouse package repository, you'll see two kinds of packages:
5252

53-
1. `stable`
54-
2. `lts` (long-term support)
53+
1. `stable`
54+
2. `lts` (long-term support)
5555

5656
Here is some guidance on how to choose between them:
5757

docs/guides/best-practices/query-optimization.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -301,9 +301,9 @@ Summarize in the table for easy reading.
301301

302302
Let's understand a bit better what the queries achieve.
303303

304-
- Query 1 calculates the distance distribution in rides with an average speed of over 30 miles per hour.
305-
- Query 2 finds the number and average cost of rides per week.
306-
- Query 3 calculates the average time of each trip in the dataset.
304+
- Query 1 calculates the distance distribution in rides with an average speed of over 30 miles per hour.
305+
- Query 2 finds the number and average cost of rides per week.
306+
- Query 3 calculates the average time of each trip in the dataset.
307307

308308
None of these queries are doing very complex processing, except the first query that calculates the trip time on the fly every time the query executes. However, each of these queries takes more than one second to execute, which, in the ClickHouse world, is a very long time. We can also note the memory usage of these queries; more or less 400 Mb for each query is quite a lot of memory. Also, each query appears to read the same number of rows (i.e., 329.04 million). Let's quickly confirm how many rows are in this table.
309309

@@ -315,7 +315,7 @@ Let's understand a bit better what the queries achieve.
315315
Query id: 733372c5-deaf-4719-94e3-261540933b23
316316

317317
┌───count()─┐
318-
1. │ 329044175-- 329.04 million
318+
1. │ 329044175-- 329.04 million
319319
└───────────┘
320320
```
321321

@@ -600,9 +600,9 @@ Choosing the correct set of primary keys is a complex topic, and it might requir
600600
601601
For now, we're going to follow these simple practices:
602602

603-
- Use fields that are used to filter in most queries
604-
- Choose columns with lower cardinality first
605-
- Consider a time-based component in your primary key, as filtering by time on a timestamp dataset is pretty common.
603+
- Use fields that are used to filter in most queries
604+
- Choose columns with lower cardinality first
605+
- Consider a time-based component in your primary key, as filtering by time on a timestamp dataset is pretty common.
606606

607607
In our case, we will experiment with the following primary keys: `passenger_count`, `pickup_datetime`, and `dropoff_datetime`.
608608

docs/guides/best-practices/sparse-primary-indexes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -750,7 +750,7 @@ As an example for both cases we will assume:
750750

751751
There are three different scenarios for the granule selection process for our abstract sample data in the diagram above:
752752

753-
1. Index mark 0 for which the **URL value is smaller than W3 and for which the URL value of the directly succeeding index mark is also smaller than W3** can be excluded because mark 0, and 1 have the same UserID value. Note that this exclusion-precondition ensures that granule 0 is completely composed of U1 UserID values so that ClickHouse can assume that also the maximum URL value in granule 0 is smaller than W3 and exclude the granule.
753+
1. Index mark 0 for which the **URL value is smaller than W3 and for which the URL value of the directly succeeding index mark is also smaller than W3** can be excluded because mark 0, and 1 have the same UserID value. Note that this exclusion-precondition ensures that granule 0 is completely composed of U1 UserID values so that ClickHouse can assume that also the maximum URL value in granule 0 is smaller than W3 and exclude the granule.
754754

755755
2. Index mark 1 for which the **URL value is smaller (or equal) than W3 and for which the URL value of the directly succeeding index mark is greater (or equal) than W3** is selected because it means that granule 1 can possibly contain rows with URL W3.
756756

docs/guides/developer/mutations.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,23 +31,23 @@ ALTER TABLE [<database>.]<table> UPDATE <column> = <expression> WHERE <filter_ex
3131

3232
**Examples**:
3333

34-
1. A mutation like this allows updating replacing `visitor_ids` with new ones using a dictionary lookup:
34+
1. A mutation like this allows updating replacing `visitor_ids` with new ones using a dictionary lookup:
3535

3636
```sql
3737
ALTER TABLE website.clicks
3838
UPDATE visitor_id = getDict('visitors', 'new_visitor_id', visitor_id)
3939
WHERE visit_date < '2022-01-01'
4040
```
4141

42-
2. Modifying multiple values in one command can be more efficient than multiple commands:
42+
2. Modifying multiple values in one command can be more efficient than multiple commands:
4343

4444
```sql
4545
ALTER TABLE website.clicks
4646
UPDATE url = substring(url, position(url, '://') + 3), visitor_id = new_visit_id
4747
WHERE visit_date < '2022-01-01'
4848
```
4949

50-
3. Mutations can be executed `ON CLUSTER` for sharded tables:
50+
3. Mutations can be executed `ON CLUSTER` for sharded tables:
5151

5252
```sql
5353
ALTER TABLE clicks ON CLUSTER main_cluster
@@ -76,7 +76,7 @@ The `<filter_expr>` should return a UInt8 value for each row of data.
7676
ALTER TABLE website.clicks DELETE WHERE visitor_id in (253, 1002, 4277)
7777
```
7878

79-
2. What does this query alter?
79+
2. What does this query alter?
8080
```sql
8181
ALTER TABLE clicks ON CLUSTER main_cluster DELETE WHERE visit_date < '2022-01-02 15:00:00' AND page_id = '573'
8282
```

docs/guides/sre/configuring-ssl.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -236,7 +236,7 @@ For a full explanation of all options, visit https://clickhouse.com/docs/operati
236236
## 5. Configure SSL-TLS interfaces on ClickHouse nodes {#5-configure-ssl-tls-interfaces-on-clickhouse-nodes}
237237
The settings below are configured in the ClickHouse server `config.xml`
238238

239-
1. Set the display name for the deployment (optional):
239+
1. Set the display name for the deployment (optional):
240240
```xml
241241
<display_name>clickhouse</display_name>
242242
```

0 commit comments

Comments
 (0)