Merge pull request #3527 from ClickHouse/issue_3089

gingerwizard · web-flow · commit ce07fa823361 · 2025-03-20T17:16:44.000Z
remove index granularity
diff --git a/docs/cloud/reference/shared-merge-tree.md b/docs/cloud/reference/shared-merge-tree.md
@@ -98,7 +98,6 @@ CREATE TABLE default.myFirstReplacingMT
 ( `key` Int64, `someCol` String, `eventTime` DateTime )
 ENGINE = SharedReplacingMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
 ORDER BY key
-SETTINGS index_granularity = 8192
 ```
 
 ## Settings {#settings}
diff --git a/docs/guides/sre/keeper/index.md b/docs/guides/sre/keeper/index.md
@@ -30,7 +30,7 @@ External integrations are not supported.
 
 ### Configuration {#configuration}
 
-ClickHouse Keeper can be used as a standalone replacement for ZooKeeper or as an internal part of the ClickHouse server. In both cases the configuration is almost the same `.xml` file. 
+ClickHouse Keeper can be used as a standalone replacement for ZooKeeper or as an internal part of the ClickHouse server. In both cases the configuration is almost the same `.xml` file.
 
 #### Keeper configuration settings {#keeper-configuration-settings}
 
@@ -430,9 +430,9 @@ Example of configuration that enables `/ready` endpoint:
 
 ### Feature flags {#feature-flags}
 
-Keeper is fully compatible with ZooKeeper and its clients, but it also introduces some unique features and request types that can be used by ClickHouse client. 
-Because those features can introduce backward incompatible change, most of them are disabled by default and can be enabled using `keeper_server.feature_flags` config.  
-All features can be disabled explicitly.  
+Keeper is fully compatible with ZooKeeper and its clients, but it also introduces some unique features and request types that can be used by ClickHouse client.
+Because those features can introduce backward incompatible change, most of them are disabled by default and can be enabled using `keeper_server.feature_flags` config.
+All features can be disabled explicitly.
 If you want to enable a new feature for your Keeper cluster, we recommend you to first update all the Keeper instances in the cluster to a version that supports the feature and then enable the feature itself.
 
 Example of feature flag config that disables `multi_read` and enables `check_not_exists`:
@@ -450,9 +450,9 @@ Example of feature flag config that disables `multi_read` and enables `check_not
 
 The following features are available:
 
-`multi_read` - support for read multi request. Default: `1`  
-`filtered_list` - support for list request which filters results by the type of node (ephemeral or persistent). Default: `1`  
-`check_not_exists` - support for `CheckNotExists` request which asserts that node doesn't exists. Default: `0`  
+`multi_read` - support for read multi request. Default: `1`
+`filtered_list` - support for list request which filters results by the type of node (ephemeral or persistent). Default: `1`
+`check_not_exists` - support for `CheckNotExists` request which asserts that node doesn't exists. Default: `0`
 `create_if_not_exists` - support for `CreateIfNotExists` requests which will try to create a node if it doesn't exist. If it exists, no changes are applied and `ZOK` is returned. Default: `0`
 
 ### Migration from ZooKeeper {#migration-from-zookeeper}
@@ -469,10 +469,10 @@ Seamless migration from ZooKeeper to ClickHouse Keeper is not possible. You have
 clickhouse-keeper-converter --zookeeper-logs-dir /var/lib/zookeeper/version-2 --zookeeper-snapshots-dir /var/lib/zookeeper/version-2 --output-dir /path/to/clickhouse/keeper/snapshots
 ```
 
-4. Copy snapshot to ClickHouse server nodes with a configured `keeper` or start ClickHouse Keeper instead of ZooKeeper. The snapshot must persist on all nodes, otherwise, empty nodes can be faster and one of them can become a leader.  
+4. Copy snapshot to ClickHouse server nodes with a configured `keeper` or start ClickHouse Keeper instead of ZooKeeper. The snapshot must persist on all nodes, otherwise, empty nodes can be faster and one of them can become a leader.
 
 :::note
-`keeper-converter` tool is not available from the Keeper standalone binary.  
+`keeper-converter` tool is not available from the Keeper standalone binary.
 If you have ClickHouse installed, you can use the binary directly:
 
 ```bash
@@ -554,19 +554,19 @@ Following is an example of disk definitions contained inside a config.
 </clickhouse>
 ```
 
-To use a disk for logs `keeper_server.log_storage_disk` config should be set to the name of disk.  
-To use a disk for snapshots `keeper_server.snapshot_storage_disk` config should be set to the name of disk.  
-Additionally, different disks can be used for the latest logs or snapshots by using `keeper_server.latest_log_storage_disk` and `keeper_server.latest_snapshot_storage_disk` respectively.  
+To use a disk for logs `keeper_server.log_storage_disk` config should be set to the name of disk.
+To use a disk for snapshots `keeper_server.snapshot_storage_disk` config should be set to the name of disk.
+Additionally, different disks can be used for the latest logs or snapshots by using `keeper_server.latest_log_storage_disk` and `keeper_server.latest_snapshot_storage_disk` respectively.
 In that case, Keeper will automatically move files to correct disks when new logs or snapshots are created.
-To use a disk for state file, `keeper_server.state_storage_disk` config should be set to the name of disk.  
+To use a disk for state file, `keeper_server.state_storage_disk` config should be set to the name of disk.
 
 Moving files between disks is safe and there is no risk of losing data if Keeper stops in the middle of transfer.
 Until the file is completely moved to the new disk, it's not deleted from the old one.
 
-Keeper with `keeper_server.coordination_settings.force_sync` set to `true` (`true` by default) cannot satisfy some guarantees for all types of disks.  
-Right now, only disks of type `local` support persistent sync.   
-If `force_sync` is used, `log_storage_disk` should be a `local` disk if `latest_log_storage_disk` is not used.  
-If `latest_log_storage_disk` is used, it should always be a `local` disk.   
+Keeper with `keeper_server.coordination_settings.force_sync` set to `true` (`true` by default) cannot satisfy some guarantees for all types of disks.
+Right now, only disks of type `local` support persistent sync.
+If `force_sync` is used, `log_storage_disk` should be a `local` disk if `latest_log_storage_disk` is not used.
+If `latest_log_storage_disk` is used, it should always be a `local` disk.
 If `force_sync` is disabled, disks of all types can be used in any setup.
 
 A possible storage setup for a Keeper instance could look like following:
@@ -583,7 +583,7 @@ A possible storage setup for a Keeper instance could look like following:
 </clickhouse>
 ```
 
-This instance will store all but the latest logs on disk `log_s3_plain`, while the latest log will be on the `log_local` disk.  
+This instance will store all but the latest logs on disk `log_s3_plain`, while the latest log will be on the `log_local` disk.
 Same logic applies for snapshots, all but the latest snapshots will be stored on `snapshot_s3_plain`, while the latest snapshot will be on the `snapshot_local` disk.
 
 ### Changing disk setup {#changing-disk-setup}
@@ -592,9 +592,9 @@ Same logic applies for snapshots, all but the latest snapshots will be stored on
 Before applying a new disk setup, manually back up all Keeper logs and snapshots.
 :::
 
-If a tiered disk setup is defined (using separate disks for the latest files), Keeper will try to automatically move files to the correct disks on startup.  
+If a tiered disk setup is defined (using separate disks for the latest files), Keeper will try to automatically move files to the correct disks on startup.
 The same guarantee is applied as before; until the file is completely moved to the new disk, it's not deleted from the old one, so multiple restarts
-can be safely done.  
+can be safely done.
 
 If it's necessary to move files to a completely new disk (or move from a 2-disk setup to a single disk setup), it's possible to use multiple definitions of `keeper_server.old_snapshot_storage_disk` and `keeper_server.old_log_storage_disk`.
 
@@ -614,22 +614,22 @@ The following config shows how we can move from the previous 2-disk setup to a c
 </clickhouse>
 ```
 
-On startup, all the log files will be moved from `log_local` and `log_s3_plain` to the `log_local2` disk.  
+On startup, all the log files will be moved from `log_local` and `log_s3_plain` to the `log_local2` disk.
 Also, all the snapshot files will be moved from `snapshot_local` and `snapshot_s3_plain` to the `snapshot_local2` disk.
 
 ## Configuring logs cache {#configuring-logs-cache}
 
-To minimize the amount of data read from disk, Keeper caches log entries in memory.  
-If requests are large, log entries will take too much memory so the amount of cached logs is capped.  
+To minimize the amount of data read from disk, Keeper caches log entries in memory.
+If requests are large, log entries will take too much memory so the amount of cached logs is capped.
 The limit is controlled with these two configs:
 - `latest_logs_cache_size_threshold` - total size of latest logs stored in cache
 - `commit_logs_cache_size_threshold` - total size of subsequent logs that need to be committed next
 
 If the default values are too big, you can reduce the memory usage by reducing these two configs.
 
 :::note
-You can use `pfev` command to check amount of logs read from each cache and from a file.  
-You can also use metrics from Prometheus endpoint to track the current size of both caches.  
+You can use `pfev` command to check amount of logs read from each cache and from a file.
+You can also use metrics from Prometheus endpoint to track the current size of both caches.
 :::
 
 
@@ -883,7 +883,7 @@ This guide provides simple and minimal settings to configure ClickHouse Keeper w
     ```
 
     On `chnode2`:
-6. 
+6.
     ```sql
     SELECT *
     FROM db1.table1
@@ -1180,18 +1180,13 @@ SHOW CREATE TABLE db_uuid.uuid_table1;
 ```response
 SHOW CREATE TABLE db_uuid.uuid_table1
 
-Query id: 5925ecce-a54f-47d8-9c3a-ad3257840c9e
-
-┌─statement────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
-│ CREATE TABLE db_uuid.uuid_table1
+CREATE TABLE db_uuid.uuid_table1
 (
     `id` UInt64,
     `column1` String
 )
 ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/db_uuid/{uuid}', '{replica}')
 ORDER BY id
-SETTINGS index_granularity = 8192 │
-└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
 
 1 row in set. Elapsed: 0.003 sec.
 ```
diff --git a/docs/integrations/data-ingestion/data-formats/json/inference.md b/docs/integrations/data-ingestion/data-formats/json/inference.md
@@ -180,7 +180,6 @@ CREATE TABLE arxiv
 )
 ENGINE = MergeTree
 ORDER BY update_date
-SETTINGS index_granularity = 8192
 ```
 
 The above is the correct schema for this data. Schema inference is based on sampling the data and reading the data row by row. Column values are extracted according to the format, with recursive parsers and heuristics used to determine the type for each value. The maximum number of rows and bytes read from the data in schema inference is controlled by the settings [`input_format_max_rows_to_read_for_schema_inference`](/operations/settings/formats#input_format_max_rows_to_read_for_schema_inference) (25000 by default) and [`input_format_max_bytes_to_read_for_schema_inference`](/operations/settings/formats#input_format_max_bytes_to_read_for_schema_inference) (32MB by default). In the event detection is not correct, users can provide hints as described [here](/operations/settings/formats#schema_inference_make_columns_nullable).
diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/index.md b/docs/integrations/data-ingestion/etl-tools/dbt/index.md
@@ -483,7 +483,6 @@ In the previous example, our model was materialized as a view. While this might
     |)
     |ENGINE = MergeTree
     |ORDER BY (id, first_name, last_name)
-    |SETTINGS index_granularity = 8192
     +----------------------------------------
     ```
 
diff --git a/docs/integrations/data-ingestion/s3/index.md b/docs/integrations/data-ingestion/s3/index.md
@@ -149,7 +149,6 @@ CREATE TABLE trips
 ENGINE = MergeTree
 PARTITION BY toYYYYMM(pickup_date)
 ORDER BY pickup_datetime
-SETTINGS index_granularity = 8192
 ```
 
 Note the use of [partitioning](/engines/table-engines/mergetree-family/custom-partitioning-key) on the `pickup_date` field. Usually a partition key is for data management, but later on we will use this key to parallelize writes to S3.
@@ -629,7 +628,7 @@ CREATE TABLE trips_s3
 ENGINE = MergeTree
 PARTITION BY toYYYYMM(pickup_date)
 ORDER BY pickup_datetime
-SETTINGS index_granularity = 8192, storage_policy='s3_main'
+SETTINGS storage_policy='s3_main'
 ```
 
 ```sql
@@ -1131,7 +1130,7 @@ When you added the [cluster configuration](#define-a-cluster) a single shard rep
   ENGINE = ReplicatedMergeTree
   PARTITION BY toYYYYMM(pickup_date)
   ORDER BY pickup_datetime
-  SETTINGS index_granularity = 8192, storage_policy='s3_main'
+  SETTINGS storage_policy='s3_main'
   ```
   ```response
   ┌─host────┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
@@ -1156,7 +1155,7 @@ When you added the [cluster configuration](#define-a-cluster) a single shard rep
   create_table_query: CREATE TABLE default.trips (`trip_id` UInt32, `pickup_date` Date, `pickup_datetime` DateTime, `dropoff_datetime` DateTime, `pickup_longitude` Float64, `pickup_latitude` Float64, `dropoff_longitude` Float64, `dropoff_latitude` Float64, `passenger_count` UInt8, `trip_distance` Float64, `tip_amount` Float32, `total_amount` Float32, `payment_type` Enum8('UNK' = 0, 'CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4))
   # highlight-next-line
   ENGINE = ReplicatedMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}')
-  PARTITION BY toYYYYMM(pickup_date) ORDER BY pickup_datetime SETTINGS index_granularity = 8192, storage_policy = 's3_main'
+  PARTITION BY toYYYYMM(pickup_date) ORDER BY pickup_datetime SETTINGS storage_policy = 's3_main'
 
   1 row in set. Elapsed: 0.012 sec.
   ```
@@ -1228,9 +1227,9 @@ These tests will verify that data is being replicated across the two servers, an
 
 ## S3Express {#s3express}
 
-[S3Express](https://aws.amazon.com/s3/storage-classes/express-one-zone/) is a new high-performance, single-Availability Zone storage class in Amazon S3. 
+[S3Express](https://aws.amazon.com/s3/storage-classes/express-one-zone/) is a new high-performance, single-Availability Zone storage class in Amazon S3.
 
-You could refer to this [blog](https://aws.amazon.com/blogs/storage/clickhouse-cloud-amazon-s3-express-one-zone-making-a-blazing-fast-analytical-database-even-faster/) to read about our experience testing S3Express with ClickHouse. 
+You could refer to this [blog](https://aws.amazon.com/blogs/storage/clickhouse-cloud-amazon-s3-express-one-zone-making-a-blazing-fast-analytical-database-even-faster/) to read about our experience testing S3Express with ClickHouse.
 
 :::note
   S3Express stores data within a single AZ. It means data will be unavailable in case of AZ outage.
diff --git a/docs/use-cases/observability/integrating-opentelemetry.md b/docs/use-cases/observability/integrating-opentelemetry.md
@@ -525,7 +525,7 @@ ENGINE = MergeTree
 PARTITION BY toDate(Timestamp)
 ORDER BY (ServiceName, SeverityText, toUnixTimestamp(Timestamp), TraceId)
 TTL toDateTime(Timestamp) + toIntervalDay(3)
-SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
+SETTINGS ttl_only_drop_parts = 1
 ```
 
 The columns here correlate with the OTel official specification for logs documented [here](https://opentelemetry.io/docs/specs/otel/logs/data-model/).
@@ -577,7 +577,7 @@ ENGINE = MergeTree
 PARTITION BY toDate(Timestamp)
 ORDER BY (ServiceName, SpanName, toUnixTimestamp(Timestamp), TraceId)
 TTL toDateTime(Timestamp) + toIntervalDay(3)
-SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
+SETTINGS ttl_only_drop_parts = 1
 ```
 
 Again, this will correlate with the columns corresponding to OTel official specification for traces documented [here](https://opentelemetry.io/docs/specs/otel/trace/api/). The schema here employs many of the same settings as the above logs schema with additional Link columns specific to spans.
diff --git a/docs/use-cases/observability/managing-data.md b/docs/use-cases/observability/managing-data.md
@@ -179,7 +179,7 @@ This syntax currently supports [Golang Duration syntax](https://pkg.go.dev/time#
 PARTITION BY toDate(Timestamp)
 ORDER BY (ServiceName, SpanName, toUnixTimestamp(Timestamp), TraceId)
 TTL toDateTime(Timestamp) + toIntervalDay(4)
-SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1
+SETTINGS ttl_only_drop_parts = 1
 ```
 
 By default, data with an expired TTL is removed when ClickHouse [merges data parts](/engines/table-engines/mergetree-family/mergetree#mergetree-data-storage). When ClickHouse detects that data is expired, it performs an off-schedule merge.
@@ -300,7 +300,7 @@ ORDER BY (ServiceName, Timestamp)
 
 CREATE MATERIALIZED VIEW otel_logs_mv TO otel_logs_v2 AS
 SELECT
-        Body, 
+        Body,
         Timestamp::DateTime AS Timestamp,
         ServiceName,
         LogAttributes['status']::UInt16 AS Status,

Original file line number	Diff line number	Diff line change
`@@ -180,7 +180,6 @@ CREATE TABLE arxiv`
`180`	`180`	`)`
`181`	`181`	`ENGINE = MergeTree`
`182`	`182`	`ORDER BY update_date`
`183`		`-SETTINGS index_granularity = 8192`
`184`	`183`	```
`185`	`184`
`186`	`185`	The above is the correct schema for this data. Schema inference is based on sampling the data and reading the data row by row. Column values are extracted according to the format, with recursive parsers and heuristics used to determine the type for each value. The maximum number of rows and bytes read from the data in schema inference is controlled by the settings [`input_format_max_rows_to_read_for_schema_inference`](/operations/settings/formats#input_format_max_rows_to_read_for_schema_inference) (25000 by default) and [`input_format_max_bytes_to_read_for_schema_inference`](/operations/settings/formats#input_format_max_bytes_to_read_for_schema_inference) (32MB by default). In the event detection is not correct, users can provide hints as described [here](/operations/settings/formats#schema_inference_make_columns_nullable).