Skip to content

Commit ce07fa8

Browse files
authored
Merge pull request #3527 from ClickHouse/issue_3089
remove index granularity
2 parents 4b65ff2 + cefb0d6 commit ce07fa8

File tree

7 files changed

+36
-45
lines changed

7 files changed

+36
-45
lines changed

docs/cloud/reference/shared-merge-tree.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,6 @@ CREATE TABLE default.myFirstReplacingMT
9898
( `key` Int64, `someCol` String, `eventTime` DateTime )
9999
ENGINE = SharedReplacingMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
100100
ORDER BY key
101-
SETTINGS index_granularity = 8192
102101
```
103102

104103
## Settings {#settings}

docs/guides/sre/keeper/index.md

Lines changed: 27 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ External integrations are not supported.
3030

3131
### Configuration {#configuration}
3232

33-
ClickHouse Keeper can be used as a standalone replacement for ZooKeeper or as an internal part of the ClickHouse server. In both cases the configuration is almost the same `.xml` file.
33+
ClickHouse Keeper can be used as a standalone replacement for ZooKeeper or as an internal part of the ClickHouse server. In both cases the configuration is almost the same `.xml` file.
3434

3535
#### Keeper configuration settings {#keeper-configuration-settings}
3636

@@ -430,9 +430,9 @@ Example of configuration that enables `/ready` endpoint:
430430

431431
### Feature flags {#feature-flags}
432432

433-
Keeper is fully compatible with ZooKeeper and its clients, but it also introduces some unique features and request types that can be used by ClickHouse client.
434-
Because those features can introduce backward incompatible change, most of them are disabled by default and can be enabled using `keeper_server.feature_flags` config.
435-
All features can be disabled explicitly.
433+
Keeper is fully compatible with ZooKeeper and its clients, but it also introduces some unique features and request types that can be used by ClickHouse client.
434+
Because those features can introduce backward incompatible change, most of them are disabled by default and can be enabled using `keeper_server.feature_flags` config.
435+
All features can be disabled explicitly.
436436
If you want to enable a new feature for your Keeper cluster, we recommend you to first update all the Keeper instances in the cluster to a version that supports the feature and then enable the feature itself.
437437

438438
Example of feature flag config that disables `multi_read` and enables `check_not_exists`:
@@ -450,9 +450,9 @@ Example of feature flag config that disables `multi_read` and enables `check_not
450450

451451
The following features are available:
452452

453-
`multi_read` - support for read multi request. Default: `1`
454-
`filtered_list` - support for list request which filters results by the type of node (ephemeral or persistent). Default: `1`
455-
`check_not_exists` - support for `CheckNotExists` request which asserts that node doesn't exists. Default: `0`
453+
`multi_read` - support for read multi request. Default: `1`
454+
`filtered_list` - support for list request which filters results by the type of node (ephemeral or persistent). Default: `1`
455+
`check_not_exists` - support for `CheckNotExists` request which asserts that node doesn't exists. Default: `0`
456456
`create_if_not_exists` - support for `CreateIfNotExists` requests which will try to create a node if it doesn't exist. If it exists, no changes are applied and `ZOK` is returned. Default: `0`
457457

458458
### Migration from ZooKeeper {#migration-from-zookeeper}
@@ -469,10 +469,10 @@ Seamless migration from ZooKeeper to ClickHouse Keeper is not possible. You have
469469
clickhouse-keeper-converter --zookeeper-logs-dir /var/lib/zookeeper/version-2 --zookeeper-snapshots-dir /var/lib/zookeeper/version-2 --output-dir /path/to/clickhouse/keeper/snapshots
470470
```
471471

472-
4. Copy snapshot to ClickHouse server nodes with a configured `keeper` or start ClickHouse Keeper instead of ZooKeeper. The snapshot must persist on all nodes, otherwise, empty nodes can be faster and one of them can become a leader.
472+
4. Copy snapshot to ClickHouse server nodes with a configured `keeper` or start ClickHouse Keeper instead of ZooKeeper. The snapshot must persist on all nodes, otherwise, empty nodes can be faster and one of them can become a leader.
473473

474474
:::note
475-
`keeper-converter` tool is not available from the Keeper standalone binary.
475+
`keeper-converter` tool is not available from the Keeper standalone binary.
476476
If you have ClickHouse installed, you can use the binary directly:
477477

478478
```bash
@@ -554,19 +554,19 @@ Following is an example of disk definitions contained inside a config.
554554
</clickhouse>
555555
```
556556

557-
To use a disk for logs `keeper_server.log_storage_disk` config should be set to the name of disk.
558-
To use a disk for snapshots `keeper_server.snapshot_storage_disk` config should be set to the name of disk.
559-
Additionally, different disks can be used for the latest logs or snapshots by using `keeper_server.latest_log_storage_disk` and `keeper_server.latest_snapshot_storage_disk` respectively.
557+
To use a disk for logs `keeper_server.log_storage_disk` config should be set to the name of disk.
558+
To use a disk for snapshots `keeper_server.snapshot_storage_disk` config should be set to the name of disk.
559+
Additionally, different disks can be used for the latest logs or snapshots by using `keeper_server.latest_log_storage_disk` and `keeper_server.latest_snapshot_storage_disk` respectively.
560560
In that case, Keeper will automatically move files to correct disks when new logs or snapshots are created.
561-
To use a disk for state file, `keeper_server.state_storage_disk` config should be set to the name of disk.
561+
To use a disk for state file, `keeper_server.state_storage_disk` config should be set to the name of disk.
562562

563563
Moving files between disks is safe and there is no risk of losing data if Keeper stops in the middle of transfer.
564564
Until the file is completely moved to the new disk, it's not deleted from the old one.
565565

566-
Keeper with `keeper_server.coordination_settings.force_sync` set to `true` (`true` by default) cannot satisfy some guarantees for all types of disks.
567-
Right now, only disks of type `local` support persistent sync.
568-
If `force_sync` is used, `log_storage_disk` should be a `local` disk if `latest_log_storage_disk` is not used.
569-
If `latest_log_storage_disk` is used, it should always be a `local` disk.
566+
Keeper with `keeper_server.coordination_settings.force_sync` set to `true` (`true` by default) cannot satisfy some guarantees for all types of disks.
567+
Right now, only disks of type `local` support persistent sync.
568+
If `force_sync` is used, `log_storage_disk` should be a `local` disk if `latest_log_storage_disk` is not used.
569+
If `latest_log_storage_disk` is used, it should always be a `local` disk.
570570
If `force_sync` is disabled, disks of all types can be used in any setup.
571571

572572
A possible storage setup for a Keeper instance could look like following:
@@ -583,7 +583,7 @@ A possible storage setup for a Keeper instance could look like following:
583583
</clickhouse>
584584
```
585585

586-
This instance will store all but the latest logs on disk `log_s3_plain`, while the latest log will be on the `log_local` disk.
586+
This instance will store all but the latest logs on disk `log_s3_plain`, while the latest log will be on the `log_local` disk.
587587
Same logic applies for snapshots, all but the latest snapshots will be stored on `snapshot_s3_plain`, while the latest snapshot will be on the `snapshot_local` disk.
588588

589589
### Changing disk setup {#changing-disk-setup}
@@ -592,9 +592,9 @@ Same logic applies for snapshots, all but the latest snapshots will be stored on
592592
Before applying a new disk setup, manually back up all Keeper logs and snapshots.
593593
:::
594594

595-
If a tiered disk setup is defined (using separate disks for the latest files), Keeper will try to automatically move files to the correct disks on startup.
595+
If a tiered disk setup is defined (using separate disks for the latest files), Keeper will try to automatically move files to the correct disks on startup.
596596
The same guarantee is applied as before; until the file is completely moved to the new disk, it's not deleted from the old one, so multiple restarts
597-
can be safely done.
597+
can be safely done.
598598

599599
If it's necessary to move files to a completely new disk (or move from a 2-disk setup to a single disk setup), it's possible to use multiple definitions of `keeper_server.old_snapshot_storage_disk` and `keeper_server.old_log_storage_disk`.
600600

@@ -614,22 +614,22 @@ The following config shows how we can move from the previous 2-disk setup to a c
614614
</clickhouse>
615615
```
616616

617-
On startup, all the log files will be moved from `log_local` and `log_s3_plain` to the `log_local2` disk.
617+
On startup, all the log files will be moved from `log_local` and `log_s3_plain` to the `log_local2` disk.
618618
Also, all the snapshot files will be moved from `snapshot_local` and `snapshot_s3_plain` to the `snapshot_local2` disk.
619619

620620
## Configuring logs cache {#configuring-logs-cache}
621621

622-
To minimize the amount of data read from disk, Keeper caches log entries in memory.
623-
If requests are large, log entries will take too much memory so the amount of cached logs is capped.
622+
To minimize the amount of data read from disk, Keeper caches log entries in memory.
623+
If requests are large, log entries will take too much memory so the amount of cached logs is capped.
624624
The limit is controlled with these two configs:
625625
- `latest_logs_cache_size_threshold` - total size of latest logs stored in cache
626626
- `commit_logs_cache_size_threshold` - total size of subsequent logs that need to be committed next
627627

628628
If the default values are too big, you can reduce the memory usage by reducing these two configs.
629629

630630
:::note
631-
You can use `pfev` command to check amount of logs read from each cache and from a file.
632-
You can also use metrics from Prometheus endpoint to track the current size of both caches.
631+
You can use `pfev` command to check amount of logs read from each cache and from a file.
632+
You can also use metrics from Prometheus endpoint to track the current size of both caches.
633633
:::
634634

635635

@@ -883,7 +883,7 @@ This guide provides simple and minimal settings to configure ClickHouse Keeper w
883883
```
884884

885885
On `chnode2`:
886-
6.
886+
6.
887887
```sql
888888
SELECT *
889889
FROM db1.table1
@@ -1180,18 +1180,13 @@ SHOW CREATE TABLE db_uuid.uuid_table1;
11801180
```response
11811181
SHOW CREATE TABLE db_uuid.uuid_table1
11821182
1183-
Query id: 5925ecce-a54f-47d8-9c3a-ad3257840c9e
1184-
1185-
┌─statement────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1186-
│ CREATE TABLE db_uuid.uuid_table1
1183+
CREATE TABLE db_uuid.uuid_table1
11871184
(
11881185
`id` UInt64,
11891186
`column1` String
11901187
)
11911188
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/db_uuid/{uuid}', '{replica}')
11921189
ORDER BY id
1193-
SETTINGS index_granularity = 8192 │
1194-
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
11951190
11961191
1 row in set. Elapsed: 0.003 sec.
11971192
```

docs/integrations/data-ingestion/data-formats/json/inference.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,6 @@ CREATE TABLE arxiv
180180
)
181181
ENGINE = MergeTree
182182
ORDER BY update_date
183-
SETTINGS index_granularity = 8192
184183
```
185184

186185
The above is the correct schema for this data. Schema inference is based on sampling the data and reading the data row by row. Column values are extracted according to the format, with recursive parsers and heuristics used to determine the type for each value. The maximum number of rows and bytes read from the data in schema inference is controlled by the settings [`input_format_max_rows_to_read_for_schema_inference`](/operations/settings/formats#input_format_max_rows_to_read_for_schema_inference) (25000 by default) and [`input_format_max_bytes_to_read_for_schema_inference`](/operations/settings/formats#input_format_max_bytes_to_read_for_schema_inference) (32MB by default). In the event detection is not correct, users can provide hints as described [here](/operations/settings/formats#schema_inference_make_columns_nullable).

docs/integrations/data-ingestion/etl-tools/dbt/index.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -483,7 +483,6 @@ In the previous example, our model was materialized as a view. While this might
483483
|)
484484
|ENGINE = MergeTree
485485
|ORDER BY (id, first_name, last_name)
486-
|SETTINGS index_granularity = 8192
487486
+----------------------------------------
488487
```
489488

docs/integrations/data-ingestion/s3/index.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,6 @@ CREATE TABLE trips
149149
ENGINE = MergeTree
150150
PARTITION BY toYYYYMM(pickup_date)
151151
ORDER BY pickup_datetime
152-
SETTINGS index_granularity = 8192
153152
```
154153

155154
Note the use of [partitioning](/engines/table-engines/mergetree-family/custom-partitioning-key) on the `pickup_date` field. Usually a partition key is for data management, but later on we will use this key to parallelize writes to S3.
@@ -629,7 +628,7 @@ CREATE TABLE trips_s3
629628
ENGINE = MergeTree
630629
PARTITION BY toYYYYMM(pickup_date)
631630
ORDER BY pickup_datetime
632-
SETTINGS index_granularity = 8192, storage_policy='s3_main'
631+
SETTINGS storage_policy='s3_main'
633632
```
634633

635634
```sql
@@ -1131,7 +1130,7 @@ When you added the [cluster configuration](#define-a-cluster) a single shard rep
11311130
ENGINE = ReplicatedMergeTree
11321131
PARTITION BY toYYYYMM(pickup_date)
11331132
ORDER BY pickup_datetime
1134-
SETTINGS index_granularity = 8192, storage_policy='s3_main'
1133+
SETTINGS storage_policy='s3_main'
11351134
```
11361135
```response
11371136
┌─host────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
@@ -1156,7 +1155,7 @@ When you added the [cluster configuration](#define-a-cluster) a single shard rep
11561155
create_table_query: CREATE TABLE default.trips (`trip_id` UInt32, `pickup_date` Date, `pickup_datetime` DateTime, `dropoff_datetime` DateTime, `pickup_longitude` Float64, `pickup_latitude` Float64, `dropoff_longitude` Float64, `dropoff_latitude` Float64, `passenger_count` UInt8, `trip_distance` Float64, `tip_amount` Float32, `total_amount` Float32, `payment_type` Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4))
11571156
# highlight-next-line
11581157
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
1159-
PARTITION BY toYYYYMM(pickup_date) ORDER BY pickup_datetime SETTINGS index_granularity = 8192, storage_policy = 's3_main'
1158+
PARTITION BY toYYYYMM(pickup_date) ORDER BY pickup_datetime SETTINGS storage_policy = 's3_main'
11601159
11611160
1 row in set. Elapsed: 0.012 sec.
11621161
```
@@ -1228,9 +1227,9 @@ These tests will verify that data is being replicated across the two servers, an
12281227

12291228
## S3Express {#s3express}
12301229

1231-
[S3Express](https://aws.amazon.com/s3/storage-classes/express-one-zone/) is a new high-performance, single-Availability Zone storage class in Amazon S3.
1230+
[S3Express](https://aws.amazon.com/s3/storage-classes/express-one-zone/) is a new high-performance, single-Availability Zone storage class in Amazon S3.
12321231

1233-
You could refer to this [blog](https://aws.amazon.com/blogs/storage/clickhouse-cloud-amazon-s3-express-one-zone-making-a-blazing-fast-analytical-database-even-faster/) to read about our experience testing S3Express with ClickHouse.
1232+
You could refer to this [blog](https://aws.amazon.com/blogs/storage/clickhouse-cloud-amazon-s3-express-one-zone-making-a-blazing-fast-analytical-database-even-faster/) to read about our experience testing S3Express with ClickHouse.
12341233

12351234
:::note
12361235
S3Express stores data within a single AZ. It means data will be unavailable in case of AZ outage.

docs/use-cases/observability/integrating-opentelemetry.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -525,7 +525,7 @@ ENGINE = MergeTree
525525
PARTITION BY toDate(Timestamp)
526526
ORDER BY (ServiceName, SeverityText, toUnixTimestamp(Timestamp), TraceId)
527527
TTL toDateTime(Timestamp) + toIntervalDay(3)
528-
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
528+
SETTINGS ttl_only_drop_parts = 1
529529
```
530530

531531
The columns here correlate with the OTel official specification for logs documented [here](https://opentelemetry.io/docs/specs/otel/logs/data-model/).
@@ -577,7 +577,7 @@ ENGINE = MergeTree
577577
PARTITION BY toDate(Timestamp)
578578
ORDER BY (ServiceName, SpanName, toUnixTimestamp(Timestamp), TraceId)
579579
TTL toDateTime(Timestamp) + toIntervalDay(3)
580-
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
580+
SETTINGS ttl_only_drop_parts = 1
581581
```
582582

583583
Again, this will correlate with the columns corresponding to OTel official specification for traces documented [here](https://opentelemetry.io/docs/specs/otel/trace/api/). The schema here employs many of the same settings as the above logs schema with additional Link columns specific to spans.

docs/use-cases/observability/managing-data.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ This syntax currently supports [Golang Duration syntax](https://pkg.go.dev/time#
179179
PARTITION BY toDate(Timestamp)
180180
ORDER BY (ServiceName, SpanName, toUnixTimestamp(Timestamp), TraceId)
181181
TTL toDateTime(Timestamp) + toIntervalDay(4)
182-
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
182+
SETTINGS ttl_only_drop_parts = 1
183183
```
184184

185185
By default, data with an expired TTL is removed when ClickHouse [merges data parts](/engines/table-engines/mergetree-family/mergetree#mergetree-data-storage). When ClickHouse detects that data is expired, it performs an off-schedule merge.
@@ -300,7 +300,7 @@ ORDER BY (ServiceName, Timestamp)
300300

301301
CREATE MATERIALIZED VIEW otel_logs_mv TO otel_logs_v2 AS
302302
SELECT
303-
Body,
303+
Body,
304304
Timestamp::DateTime AS Timestamp,
305305
ServiceName,
306306
LogAttributes['status']::UInt16 AS Status,

0 commit comments

Comments
 (0)