You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guides/sre/keeper/index.md
+27-32Lines changed: 27 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ External integrations are not supported.
30
30
31
31
### Configuration {#configuration}
32
32
33
-
ClickHouse Keeper can be used as a standalone replacement for ZooKeeper or as an internal part of the ClickHouse server. In both cases the configuration is almost the same `.xml` file.
33
+
ClickHouse Keeper can be used as a standalone replacement for ZooKeeper or as an internal part of the ClickHouse server. In both cases the configuration is almost the same `.xml` file.
@@ -430,9 +430,9 @@ Example of configuration that enables `/ready` endpoint:
430
430
431
431
### Feature flags {#feature-flags}
432
432
433
-
Keeper is fully compatible with ZooKeeper and its clients, but it also introduces some unique features and request types that can be used by ClickHouse client.
434
-
Because those features can introduce backward incompatible change, most of them are disabled by default and can be enabled using `keeper_server.feature_flags` config.
435
-
All features can be disabled explicitly.
433
+
Keeper is fully compatible with ZooKeeper and its clients, but it also introduces some unique features and request types that can be used by ClickHouse client.
434
+
Because those features can introduce backward incompatible change, most of them are disabled by default and can be enabled using `keeper_server.feature_flags` config.
435
+
All features can be disabled explicitly.
436
436
If you want to enable a new feature for your Keeper cluster, we recommend you to first update all the Keeper instances in the cluster to a version that supports the feature and then enable the feature itself.
437
437
438
438
Example of feature flag config that disables `multi_read` and enables `check_not_exists`:
@@ -450,9 +450,9 @@ Example of feature flag config that disables `multi_read` and enables `check_not
450
450
451
451
The following features are available:
452
452
453
-
`multi_read` - support for read multi request. Default: `1`
454
-
`filtered_list` - support for list request which filters results by the type of node (ephemeral or persistent). Default: `1`
455
-
`check_not_exists` - support for `CheckNotExists` request which asserts that node doesn't exists. Default: `0`
453
+
`multi_read` - support for read multi request. Default: `1`
454
+
`filtered_list` - support for list request which filters results by the type of node (ephemeral or persistent). Default: `1`
455
+
`check_not_exists` - support for `CheckNotExists` request which asserts that node doesn't exists. Default: `0`
456
456
`create_if_not_exists` - support for `CreateIfNotExists` requests which will try to create a node if it doesn't exist. If it exists, no changes are applied and `ZOK` is returned. Default: `0`
457
457
458
458
### Migration from ZooKeeper {#migration-from-zookeeper}
@@ -469,10 +469,10 @@ Seamless migration from ZooKeeper to ClickHouse Keeper is not possible. You have
4. Copy snapshot to ClickHouse server nodes with a configured `keeper` or start ClickHouse Keeper instead of ZooKeeper. The snapshot must persist on all nodes, otherwise, empty nodes can be faster and one of them can become a leader.
472
+
4. Copy snapshot to ClickHouse server nodes with a configured `keeper` or start ClickHouse Keeper instead of ZooKeeper. The snapshot must persist on all nodes, otherwise, empty nodes can be faster and one of them can become a leader.
473
473
474
474
:::note
475
-
`keeper-converter` tool is not available from the Keeper standalone binary.
475
+
`keeper-converter` tool is not available from the Keeper standalone binary.
476
476
If you have ClickHouse installed, you can use the binary directly:
477
477
478
478
```bash
@@ -554,19 +554,19 @@ Following is an example of disk definitions contained inside a config.
554
554
</clickhouse>
555
555
```
556
556
557
-
To use a disk for logs `keeper_server.log_storage_disk` config should be set to the name of disk.
558
-
To use a disk for snapshots `keeper_server.snapshot_storage_disk` config should be set to the name of disk.
559
-
Additionally, different disks can be used for the latest logs or snapshots by using `keeper_server.latest_log_storage_disk` and `keeper_server.latest_snapshot_storage_disk` respectively.
557
+
To use a disk for logs `keeper_server.log_storage_disk` config should be set to the name of disk.
558
+
To use a disk for snapshots `keeper_server.snapshot_storage_disk` config should be set to the name of disk.
559
+
Additionally, different disks can be used for the latest logs or snapshots by using `keeper_server.latest_log_storage_disk` and `keeper_server.latest_snapshot_storage_disk` respectively.
560
560
In that case, Keeper will automatically move files to correct disks when new logs or snapshots are created.
561
-
To use a disk for state file, `keeper_server.state_storage_disk` config should be set to the name of disk.
561
+
To use a disk for state file, `keeper_server.state_storage_disk` config should be set to the name of disk.
562
562
563
563
Moving files between disks is safe and there is no risk of losing data if Keeper stops in the middle of transfer.
564
564
Until the file is completely moved to the new disk, it's not deleted from the old one.
565
565
566
-
Keeper with `keeper_server.coordination_settings.force_sync` set to `true` (`true` by default) cannot satisfy some guarantees for all types of disks.
567
-
Right now, only disks of type `local` support persistent sync.
568
-
If `force_sync` is used, `log_storage_disk` should be a `local` disk if `latest_log_storage_disk` is not used.
569
-
If `latest_log_storage_disk` is used, it should always be a `local` disk.
566
+
Keeper with `keeper_server.coordination_settings.force_sync` set to `true` (`true` by default) cannot satisfy some guarantees for all types of disks.
567
+
Right now, only disks of type `local` support persistent sync.
568
+
If `force_sync` is used, `log_storage_disk` should be a `local` disk if `latest_log_storage_disk` is not used.
569
+
If `latest_log_storage_disk` is used, it should always be a `local` disk.
570
570
If `force_sync` is disabled, disks of all types can be used in any setup.
571
571
572
572
A possible storage setup for a Keeper instance could look like following:
@@ -583,7 +583,7 @@ A possible storage setup for a Keeper instance could look like following:
583
583
</clickhouse>
584
584
```
585
585
586
-
This instance will store all but the latest logs on disk `log_s3_plain`, while the latest log will be on the `log_local` disk.
586
+
This instance will store all but the latest logs on disk `log_s3_plain`, while the latest log will be on the `log_local` disk.
587
587
Same logic applies for snapshots, all but the latest snapshots will be stored on `snapshot_s3_plain`, while the latest snapshot will be on the `snapshot_local` disk.
588
588
589
589
### Changing disk setup {#changing-disk-setup}
@@ -592,9 +592,9 @@ Same logic applies for snapshots, all but the latest snapshots will be stored on
592
592
Before applying a new disk setup, manually back up all Keeper logs and snapshots.
593
593
:::
594
594
595
-
If a tiered disk setup is defined (using separate disks for the latest files), Keeper will try to automatically move files to the correct disks on startup.
595
+
If a tiered disk setup is defined (using separate disks for the latest files), Keeper will try to automatically move files to the correct disks on startup.
596
596
The same guarantee is applied as before; until the file is completely moved to the new disk, it's not deleted from the old one, so multiple restarts
597
-
can be safely done.
597
+
can be safely done.
598
598
599
599
If it's necessary to move files to a completely new disk (or move from a 2-disk setup to a single disk setup), it's possible to use multiple definitions of `keeper_server.old_snapshot_storage_disk` and `keeper_server.old_log_storage_disk`.
600
600
@@ -614,22 +614,22 @@ The following config shows how we can move from the previous 2-disk setup to a c
614
614
</clickhouse>
615
615
```
616
616
617
-
On startup, all the log files will be moved from `log_local` and `log_s3_plain` to the `log_local2` disk.
617
+
On startup, all the log files will be moved from `log_local` and `log_s3_plain` to the `log_local2` disk.
618
618
Also, all the snapshot files will be moved from `snapshot_local` and `snapshot_s3_plain` to the `snapshot_local2` disk.
Copy file name to clipboardExpand all lines: docs/integrations/data-ingestion/data-formats/json/inference.md
-1Lines changed: 0 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -180,7 +180,6 @@ CREATE TABLE arxiv
180
180
)
181
181
ENGINE = MergeTree
182
182
ORDER BY update_date
183
-
SETTINGS index_granularity =8192
184
183
```
185
184
186
185
The above is the correct schema for this data. Schema inference is based on sampling the data and reading the data row by row. Column values are extracted according to the format, with recursive parsers and heuristics used to determine the type for each value. The maximum number of rows and bytes read from the data in schema inference is controlled by the settings [`input_format_max_rows_to_read_for_schema_inference`](/operations/settings/formats#input_format_max_rows_to_read_for_schema_inference) (25000 by default) and [`input_format_max_bytes_to_read_for_schema_inference`](/operations/settings/formats#input_format_max_bytes_to_read_for_schema_inference) (32MB by default). In the event detection is not correct, users can provide hints as described [here](/operations/settings/formats#schema_inference_make_columns_nullable).
Copy file name to clipboardExpand all lines: docs/integrations/data-ingestion/s3/index.md
+5-6Lines changed: 5 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -149,7 +149,6 @@ CREATE TABLE trips
149
149
ENGINE = MergeTree
150
150
PARTITION BY toYYYYMM(pickup_date)
151
151
ORDER BY pickup_datetime
152
-
SETTINGS index_granularity =8192
153
152
```
154
153
155
154
Note the use of [partitioning](/engines/table-engines/mergetree-family/custom-partitioning-key) on the `pickup_date` field. Usually a partition key is for data management, but later on we will use this key to parallelize writes to S3.
PARTITION BY toYYYYMM(pickup_date) ORDER BY pickup_datetime SETTINGS index_granularity = 8192, storage_policy = 's3_main'
1158
+
PARTITION BY toYYYYMM(pickup_date) ORDER BY pickup_datetime SETTINGS storage_policy = 's3_main'
1160
1159
1161
1160
1 row in set. Elapsed: 0.012 sec.
1162
1161
```
@@ -1228,9 +1227,9 @@ These tests will verify that data is being replicated across the two servers, an
1228
1227
1229
1228
## S3Express {#s3express}
1230
1229
1231
-
[S3Express](https://aws.amazon.com/s3/storage-classes/express-one-zone/) is a new high-performance, single-Availability Zone storage class in Amazon S3.
1230
+
[S3Express](https://aws.amazon.com/s3/storage-classes/express-one-zone/) is a new high-performance, single-Availability Zone storage class in Amazon S3.
1232
1231
1233
-
You could refer to this [blog](https://aws.amazon.com/blogs/storage/clickhouse-cloud-amazon-s3-express-one-zone-making-a-blazing-fast-analytical-database-even-faster/) to read about our experience testing S3Express with ClickHouse.
1232
+
You could refer to this [blog](https://aws.amazon.com/blogs/storage/clickhouse-cloud-amazon-s3-express-one-zone-making-a-blazing-fast-analytical-database-even-faster/) to read about our experience testing S3Express with ClickHouse.
1234
1233
1235
1234
:::note
1236
1235
S3Express stores data within a single AZ. It means data will be unavailable in case of AZ outage.
Again, this will correlate with the columns corresponding to OTel official specification for traces documented [here](https://opentelemetry.io/docs/specs/otel/trace/api/). The schema here employs many of the same settings as the above logs schema with additional Link columns specific to spans.
By default, data with an expired TTL is removed when ClickHouse [merges data parts](/engines/table-engines/mergetree-family/mergetree#mergetree-data-storage). When ClickHouse detects that data is expired, it performs an off-schedule merge.
@@ -300,7 +300,7 @@ ORDER BY (ServiceName, Timestamp)
300
300
301
301
CREATE MATERIALIZED VIEW otel_logs_mv TO otel_logs_v2 AS
0 commit comments