diff --git a/TOC.md b/TOC.md index 73b8e996af9f6..1899328102a4b 100644 --- a/TOC.md +++ b/TOC.md @@ -246,6 +246,7 @@ - [Use Overview](/br/br-use-overview.md) - [Snapshot Backup and Restore Guide](/br/br-snapshot-guide.md) - [Log Backup and PITR Guide](/br/br-pitr-guide.md) + - [Compact Log Backup](/br/br-compact-log-backup.md) - [Use Cases](/br/backup-and-restore-use-cases.md) - [Backup Storages](/br/backup-and-restore-storages.md) - BR CLI Manuals @@ -880,6 +881,7 @@ - [`SET ROLE`](/sql-statements/sql-statement-set-role.md) - [`SET TRANSACTION`](/sql-statements/sql-statement-set-transaction.md) - [`SET `](/sql-statements/sql-statement-set-variable.md) + - [`SHOW AFFINITY`](/sql-statements/sql-statement-show-affinity.md) - [`SHOW ANALYZE STATUS`](/sql-statements/sql-statement-show-analyze-status.md) - [`SHOW [BACKUPS|RESTORES]`](/sql-statements/sql-statement-show-backups.md) - [`SHOW BINDINGS`](/sql-statements/sql-statement-show-bindings.md) @@ -997,6 +999,7 @@ - [Temporary Tables](/temporary-tables.md) - [Cached Tables](/cached-tables.md) - [FOREIGN KEY Constraints](/foreign-key.md) + - [Table-Level Data Affinity](/table-affinity.md) - Character Set and Collation - [Overview](/character-set-and-collation.md) - [GBK](/character-set-gbk.md) diff --git a/best-practices/pd-scheduling-best-practices.md b/best-practices/pd-scheduling-best-practices.md index f9b2263a14f07..b81bd3217d9ba 100644 --- a/best-practices/pd-scheduling-best-practices.md +++ b/best-practices/pd-scheduling-best-practices.md @@ -296,7 +296,9 @@ If a TiKV node fails, PD defaults to setting the corresponding node to the **dow Practically, if a node failure is considered unrecoverable, you can immediately take it offline. This makes PD replenish replicas soon in another node and reduces the risk of data loss. In contrast, if a node is considered recoverable, but the recovery cannot be done in 30 minutes, you can temporarily adjust `max-store-down-time` to a larger value to avoid unnecessary replenishment of the replicas and resources waste after the timeout. -In TiDB v5.2.0, TiKV introduces the mechanism of slow TiKV node detection. By sampling the requests in TiKV, this mechanism works out a score ranging from 1 to 100. A TiKV node with a score higher than or equal to 80 is marked as slow. You can add [`evict-slow-store-scheduler`](/pd-control.md#scheduler-show--add--remove--pause--resume--config--describe) to detect and schedule slow nodes. If only one TiKV is detected as slow, and the slow score reaches the limit (80 by default), the Leader in this node will be evicted (similar to the effect of `evict-leader-scheduler`). +Starting from TiDB v5.2.0, TiKV introduces a mechanism to detect slow-disk nodes. By sampling the requests in TiKV, this mechanism works out a score ranging from 1 to 100. A TiKV node with a score higher than or equal to 80 is marked as slow. You can add [`evict-slow-store-scheduler`](/pd-control.md#scheduler-show--add--remove--pause--resume--config--describe) to schedule slow nodes. If only one TiKV node is detected as slow, and its slow score reaches the limit (80 by default), the Leaders on that node will be evicted (similar to the effect of `evict-leader-scheduler`). + +Starting from v8.5.5, TiKV introduces a mechanism to detect slow-network nodes. Similar to slow-disk node detection, this mechanism identifies slow nodes by probing network latency between TiKV nodes and calculating a score. You can enable this mechanism using [`enable-network-slow-store`](/pd-control.md#scheduler-config-evict-slow-store-scheduler). > **Note:** > diff --git a/br/backup-and-restore-overview.md b/br/backup-and-restore-overview.md index 825b886e88e63..e5f38b3effc68 100644 --- a/br/backup-and-restore-overview.md +++ b/br/backup-and-restore-overview.md @@ -21,7 +21,6 @@ This section describes the prerequisites for using TiDB backup and restore, incl ### Restrictions - PITR only supports restoring data to **an empty cluster**. -- PITR only supports cluster-level restore and does not support database-level or table-level restore. - PITR does not support restoring the data of user tables or privilege tables from system tables. - BR does not support running multiple backup tasks on a cluster **at the same time**. - It is not recommended to back up tables that are being restored, because the backed-up data might be problematic. diff --git a/br/backup-and-restore-storages.md b/br/backup-and-restore-storages.md index ebb19b7c6a5ab..e53d9a3ed4a58 100644 --- a/br/backup-and-restore-storages.md +++ b/br/backup-and-restore-storages.md @@ -202,6 +202,66 @@ You can configure the account used to access GCS by specifying the access key. I --storage "azure://external/backup-20220915?account-name=${account-name}" ``` +- Method 4: Use Azure managed identities + + Starting from v8.5.5, if your TiDB cluster and BR are running in an Azure Virtual Machine (VM) or Azure Kubernetes Service (AKS) environment and Azure managed identities have been assigned to the nodes, you can use Azure managed identities for authentication. + + Before using this method, ensure that you have granted the permissions (such as `Storage Blob Data Contributor`) to the corresponding managed identity to access the target storage account in the [Azure Portal](https://azure.microsoft.com/). + + - **System-assigned managed identity**: + + When using a system-assigned managed identity, there is no need to configure any Azure-related environment variables. You can run the BR backup command directly. + + ```shell + tiup br backup full -u "${PD_IP}:2379" \ + --storage "azure://external/backup-20220915?account-name=${account-name}" + ``` + + > **Note:** + > + > Ensure that the `AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, and `AZURE_CLIENT_SECRET` environment variables are **not** set in the runtime environment. Otherwise, the Azure SDK might prioritize other authentication methods, preventing the managed identity from taking effect. + + - **User-assigned managed identity**: + + When using a user-assigned managed identity, you need to configure the `AZURE_CLIENT_ID` environment variable in the runtime environment of TiKV and BR, set its value to the client ID of the managed identity, and then run the BR backup command. The detailed steps are as follows: + + 1. Configure the client ID for TiKV when starting with TiUP: + + The following steps use the TiKV port `24000` and the systemd service name `tikv-24000` as an example: + + 1. Open the systemd service editor by running the following command: + + ```shell + systemctl edit tikv-24000 + ``` + + 2. Set the `AZURE_CLIENT_ID` environment variable to your managed identity client ID: + + ```ini + [Service] + Environment="AZURE_CLIENT_ID=" + ``` + + 3. Reload the systemd configuration and restart TiKV: + + ```shell + systemctl daemon-reload + systemctl restart tikv-24000 + ``` + + 2. Configure the `AZURE_CLIENT_ID` environment variable for BR: + + ```shell + export AZURE_CLIENT_ID="" + ``` + + 3. Back up data to Azure Blob Storage using the following BR command: + + ```shell + tiup br backup full -u "${PD_IP}:2379" \ + --storage "azure://external/backup-20220915?account-name=${account-name}" + ``` + diff --git a/br/backup-and-restore-use-cases.md b/br/backup-and-restore-use-cases.md index affaf6684f701..86c99f0909ae3 100644 --- a/br/backup-and-restore-use-cases.md +++ b/br/backup-and-restore-use-cases.md @@ -144,7 +144,9 @@ tiup br restore point --pd="${PD_IP}:2379" \ --full-backup-storage='s3://tidb-pitr-bucket/backup-data/snapshot-20220514000000' \ --restored-ts '2022-05-15 18:00:00+0800' -Full Restore <--------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% +Split&Scatter Region <--------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% +Download&Ingest SST <--------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% +Restore Pipeline <--------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% [2022/05/29 18:15:39.132 +08:00] [INFO] [collector.go:69] ["Full Restore success summary"] [total-ranges=12] [ranges-succeed=xxx] [ranges-failed=0] [split-region=xxx.xxxµs] [restore-ranges=xxx] [total-take=xxx.xxxs] [restore-data-size(after-compressed)=xxx.xxx] [Size=xxxx] [BackupTS={TS}] [total-kv=xxx] [total-kv-size=xxx] [average-speed=xxx] Restore Meta Files <--------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% Restore KV Files <----------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% diff --git a/br/br-checkpoint-restore.md b/br/br-checkpoint-restore.md index b94ecc47361a4..f9a1479bd6f93 100644 --- a/br/br-checkpoint-restore.md +++ b/br/br-checkpoint-restore.md @@ -15,7 +15,7 @@ If your TiDB cluster is large and cannot afford to restore again after a failure ## Implementation principles -The implementation of checkpoint restore is divided into two parts: snapshot restore and log restore. For more information, see [Implementation details](#implementation-details). +The implementation of checkpoint restore is divided into two parts: snapshot restore and log restore. For more information, see [Implementation details: store checkpoint data in the downstream cluster](#implementation-details-store-checkpoint-data-in-the-downstream-cluster) and [Implementation details: store checkpoint data in the external storage](#implementation-details-store-checkpoint-data-in-the-external-storage). ### Snapshot restore @@ -65,7 +65,11 @@ After a restore failure, avoid writing, deleting, or creating tables in the clus Cross-major-version checkpoint recovery is not recommended. For clusters where `br` recovery fails using the Long-Term Support (LTS) versions prior to v8.5.0, recovery cannot be continued with v8.5.0 or later LTS versions, and vice versa. -## Implementation details +## Implementation details: store checkpoint data in the downstream cluster + +> **Note:** +> +> Starting from v8.5.5, BR stores checkpoint data in the downstream cluster by default. You can specify an external storage for checkpoint data using the `--checkpoint-storage` parameter. Checkpoint restore operations are divided into two parts: snapshot restore and PITR restore. @@ -81,8 +85,78 @@ If the restore fails and you try to restore backup data with different checkpoin [PITR (Point-in-time recovery)](/br/br-pitr-guide.md) consists of snapshot restore and log restore phases. -During the initial restore, `br` first enters the snapshot restore phase. This phase follows the same process as the preceding [snapshot restore](#snapshot-restore-1): BR records the checkpoint data, the upstream cluster ID, and BackupTS of the backup data (that is, the start time point `start-ts` of log restore) in the `__TiDB_BR_Temporary_Snapshot_Restore_Checkpoint` database. If restore fails during this phase, you cannot adjust the `start-ts` of log restore when resuming checkpoint restore. +During the initial restore, `br` first enters the snapshot restore phase. BR records the checkpoint data, the upstream cluster ID, BackupTS of the backup data (that is, the start time point `start-ts` of log restore) and the restored time point `restored-ts` of log restore in the `__TiDB_BR_Temporary_Snapshot_Restore_Checkpoint` database. If restore fails during this phase, you cannot adjust the `start-ts` and `restored-ts` of log restore when resuming checkpoint restore. When entering the log restore phase during the initial restore, `br` creates a `__TiDB_BR_Temporary_Log_Restore_Checkpoint` database in the target cluster. This database records checkpoint data, the upstream cluster ID, and the restore time range (`start-ts` and `restored-ts`). If restore fails during this phase, you need to specify the same `start-ts` and `restored-ts` as recorded in the checkpoint database when retrying. Otherwise, `br` will report an error and prompt that the current specified restore time range or upstream cluster ID is different from the checkpoint record. If the restore cluster has been cleaned, you can manually delete the `__TiDB_BR_Temporary_Log_Restore_Checkpoint` database and retry with a different backup. -Before entering the log restore phase during the initial restore, `br` constructs a mapping of upstream and downstream cluster database and table IDs at the `restored-ts` time point. This mapping is persisted in the system table `mysql.tidb_pitr_id_map` to prevent duplicate allocation of database and table IDs. Deleting data from `mysql.tidb_pitr_id_map` might lead to inconsistent PITR restore data. +Note that before entering the log restore phase during the initial restore, `br` constructs a mapping of upstream and downstream cluster database and table IDs at the `restored-ts` time point. This mapping is persisted in the system table `mysql.tidb_pitr_id_map` to prevent duplicate allocation of database and table IDs. **Deleting data from `mysql.tidb_pitr_id_map` arbitrarily might lead to inconsistent PITR restore data.** + +> **Note:** +> +> To ensure compatibility with clusters of earlier versions, starting from v8.5.5, if the system table `mysql.tidb_pitr_id_map` does not exist in the restore cluster, the `pitr_id_map` data will be written to the log backup directory. The file name is `pitr_id_maps/pitr_id_map.cluster_id:{downstream-cluster-ID}.restored_ts:{restored-ts}`. + +## Implementation details: store checkpoint data in the external storage + +> **Note:** +> +> Starting from v8.5.5, BR stores checkpoint data in the downstream cluster by default. You can specify an external storage for checkpoint data using the `--checkpoint-storage` parameter. For example: +> +> ```shell +> ./br restore full -s "s3://backup-bucket/backup-prefix" --checkpoint-storage "s3://temp-bucket/checkpoints" +> ``` + +In the external storage, the directory structure of the checkpoint data is as follows: + +- Root path `restore-{downstream-cluster-ID}` uses the downstream cluster ID `{downstream-cluster-ID}` to distinguish between different restore clusters. +- Path `restore-{downstream-cluster-ID}/log` stores log file checkpoint data during the log restore phase. +- Path `restore-{downstream-cluster-ID}/sst` stores checkpoint data of the SST files that are not backed up by log backup during the log restore phase. +- Path `restore-{downstream-cluster-ID}/snapshot` stores checkpoint data during the snapshot restore phase. + +``` +. +`-- restore-{downstream-cluster-ID} + |-- log + | |-- checkpoint.meta + | |-- data + | | |-- {uuid}.cpt + | | |-- {uuid}.cpt + | | `-- {uuid}.cpt + | |-- ingest_index.meta + | `-- progress.meta + |-- snapshot + | |-- checkpoint.meta + | |-- checksum + | | |-- {uuid}.cpt + | | |-- {uuid}.cpt + | | `-- {uuid}.cpt + | `-- data + | |-- {uuid}.cpt + | |-- {uuid}.cpt + | `-- {uuid}.cpt + `-- sst + `-- checkpoint.meta +``` + +Checkpoint restore operations are divided into two parts: snapshot restore and PITR restore. + +### Snapshot restore + +During the initial restore, `br` creates a `restore-{downstream-cluster-ID}/snapshot` path in the specified external storage. In this path, `br` records checkpoint data, the upstream cluster ID, and the BackupTS of the backup data. + +If the restore fails, you can retry it using the same command. `br` will automatically read the checkpoint information from the specified external storage path and resume from the last restore point. + +If the restore fails and you try to restore backup data with different checkpoint information to the same cluster, `br` reports an error. It indicates that the current upstream cluster ID or BackupTS is different from the checkpoint record. If the restore cluster has been cleaned, you can manually clean up the checkpoint data in the external storage or specify another external storage path to store checkpoint data, and retry with a different backup. + +### PITR restore + +[PITR (Point-in-time recovery)](/br/br-pitr-guide.md) consists of snapshot restore and log restore phases. + +During the initial restore, `br` first enters the snapshot restore phase. BR records the checkpoint data, the upstream cluster ID, BackupTS of the backup data (that is, the start time point `start-ts` of log restore) and the restored time point `restored-ts` of log restore in the `restore-{downstream-cluster-ID}/snapshot` path. If restore fails during this phase, you cannot adjust the `start-ts` and `restored-ts` of log restore when resuming checkpoint restore. + +When entering the log restore phase during the initial restore, `br` creates a `restore-{downstream-cluster-ID}/log` path in the specified external storage. This path records checkpoint data, the upstream cluster ID, and the restore time range (`start-ts` and `restored-ts`). If restore fails during this phase, you need to specify the same `start-ts` and `restored-ts` as recorded in the checkpoint database when retrying. Otherwise, `br` will report an error and prompt that the current specified restore time range or upstream cluster ID is different from the checkpoint record. If the restore cluster has been cleaned, you can manually clean up the checkpoint data in the external storage or specify another external storage path to store checkpoint data, and retry with a different backup. + +Note that before entering the log restore phase during the initial restore, `br` constructs a mapping of the database and table IDs in the upstream and downstream clusters at the `restored-ts` time point. This mapping is persisted in the checkpoint storage with the file name `pitr_id_maps/pitr_id_map.cluster_id:{downstream-cluster-ID}.restored_ts:{restored-ts}` to prevent duplicate allocation of database and table IDs. **Deleting files from the directory `pitr_id_maps` arbitrarily might lead to inconsistent PITR restore data.** + +> **Note:** +> +> To ensure compatibility with clusters of earlier versions, starting from v8.5.5, if the system table `mysql.tidb_pitr_id_map` does not exist in the restore cluster and the `--checkpoint-storage` parameter is not specified, the `pitr_id_map` data will be written to the log backup directory. The file name is `pitr_id_maps/pitr_id_map.cluster_id:{downstream-cluster-ID}.restored_ts:{restored-ts}`. diff --git a/br/br-compact-log-backup.md b/br/br-compact-log-backup.md new file mode 100644 index 0000000000000..9c969f1148e52 --- /dev/null +++ b/br/br-compact-log-backup.md @@ -0,0 +1,85 @@ +--- +title: Compact Log Backup +summary: Learn how to improve Point-in-time Recovery (PITR) efficiency by compacting log backups into the SST format. +--- + +# Compact Log Backup + +This document describes how to improve the efficiency of point-in-time recovery ([PITR](/glossary.md#point-in-time-recovery-pitr)) by compacting log backups into the [SST](/glossary.md#static-sorted-table--sorted-string-table-sst) format. + +## Overview + +Traditional log backups store write operations in a highly unstructured manner, which can lead to the following issues: + +- **Reduced recovery performance**: unordered data has to be written to the cluster one by one through the Raft protocol. +- **Write amplification**: all writes must be compacted from L0 to the bottommost level by level. +- **Dependency on full backups**: frequent full backups are required to control the amount of recovery data, which can impact application operations. + +Starting from v8.5.5, the compact log backup feature provides offline compaction capabilities, converting unstructured log backup data into structured SST files. This results in the following improvements: + +- SST files can be quickly imported into the cluster, **improving recovery performance**. +- Redundant data is removed during compaction, **reducing storage space consumption**. +- You can set longer full backup intervals while ensuring the Recovery Time Objective (RTO), **reducing the impact on applications**. + +## Limitations + +- Compact log backup is not a replacement for full backups. It must be used in conjunction with periodic full backups. To ensure PITR capability, the compacting process retains all MVCC versions. Failing to perform full backups for a long time can lead to excessive storage usage and might cause issues when restoring data later. +- Currently, compacting backups with local encryption enabled is not supported. + +## Use compact log backup + +Currently, only manual compaction of log backups is supported, and the process is complex. **It is recommended to use the upcoming TiDB Operator solution for compacting log backups in production environments.** + +### Manual compaction + +This section describes the steps for manually compacting log backups. + +#### Prerequisites + +Manual compaction of log backups requires two tools: `tikv-ctl` and `br`. + +#### Step 1: Encode storage to Base64 + +Execute the following encoding command: + +```shell +br operator base64ify --storage "s3://your/log/backup/storage/here" --load-creds +``` + +> **Note:** +> +> - If the `--load-creds` option is included when you execute the preceding command, the encoded Base64 string contains credential information loaded from the current BR environment. Note to ensure proper security and access control. +> - The `--storage` value matches the storage output from the `log status` command of the log backup task. + +#### Step 2: Execute log compaction + +With the Base64-encoded storage, you can initiate the compaction using `tikv-ctl`. Note that the default log level of `tikv-ctl` is `warning`. Use `--log-level info` to obtain more detailed information: + +```shell +tikv-ctl --log-level info compact-log-backup \ + --from "" --until "" \ + -s 'bAsE64==' -N 8 +``` + +Parameter descriptions: + +- `-s`: the Base64-encoded storage string obtained earlier. +- `-N`: the maximum number of concurrent log compaction tasks. +- `--from`: the start timestamp for compaction. +- `--until`: the end timestamp for compaction. + +The `--from` and `--until` parameters define the time range for the compaction operation. The compaction operation handles all log files containing write operations within the specified time range, so the generated SST files might include data outside this range. + +To obtain the timestamp for a specific point in time, execute the following command: + +```shell +echo $(( $(date --date '2004-05-06 15:02:01Z' +%s%3N) << 18 )) +``` + +> **Note:** +> +> If you are a macOS user, you need to install `coreutils` via Homebrew and use `gdate` instead of `date`. +> +> ```shell +> echo $(( $(gdate --date '2004-05-06 15:02:01Z' +%s%3N) << 18 )) +> ``` diff --git a/br/br-pitr-guide.md b/br/br-pitr-guide.md index 27aecb8b9e196..556bad548d2a7 100644 --- a/br/br-pitr-guide.md +++ b/br/br-pitr-guide.md @@ -93,13 +93,17 @@ tiup br restore point --pd "${PD_IP}:2379" \ During data restore, you can view the progress through the progress bar in the terminal. The restore is divided into two phases, full restore and log restore (restore meta files and restore KV files). After each phase is completed, `br` outputs information such as restore time and data size. ```shell -Full Restore <--------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% +Split&Scatter Region <--------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% +Download&Ingest SST <--------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% +Restore Pipeline <--------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% *** ["Full Restore success summary"] ****** [total-take=xxx.xxxs] [restore-data-size(after-compressed)=xxx.xxx] [Size=xxxx] [BackupTS={TS}] [total-kv=xxx] [total-kv-size=xxx] [average-speed=xxx] Restore Meta Files <--------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% Restore KV Files <----------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% *** ["restore log success summary"] [total-take=xxx.xx] [restore-from={TS}] [restore-to={TS}] [total-kv-count=xxx] [total-size=xxx] ``` +During data restore, the table mode of the target table is automatically set to `restore`. Tables in `restore` mode do not allow any read or write operations. After data restore is complete, the table mode automatically switches back to `normal`, and you can read and write the table normally. This mechanism ensures task stability and data consistency throughout the restore process. + ## Clean up outdated data As described in the [Usage Overview of TiDB Backup and Restore](/br/br-use-overview.md): diff --git a/br/br-pitr-manual.md b/br/br-pitr-manual.md index c873fb2789725..8586156dd0474 100644 --- a/br/br-pitr-manual.md +++ b/br/br-pitr-manual.md @@ -458,7 +458,9 @@ tiup br restore point --pd="${PD_IP}:2379" --storage='s3://backup-101/logbackup?access-key=${access-key}&secret-access-key=${secret-access-key}' --full-backup-storage='s3://backup-101/snapshot-202205120000?access-key=${access-key}&secret-access-key=${secret-access-key}' -Full Restore <--------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% +Split&Scatter Region <--------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% +Download&Ingest SST <--------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% +Restore Pipeline <--------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% *** ***["Full Restore success summary"] ****** [total-take=3.112928252s] [restore-data-size(after-compressed)=5.056kB] [Size=5056] [BackupTS=434693927394607136] [total-kv=4] [total-kv-size=290B] [average-speed=93.16B/s] Restore Meta Files <--------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% Restore KV Files <----------------------------------------------------------------------------------------------------------------------------------------------------> 100.00% @@ -498,3 +500,156 @@ tiup br restore point --pd="${PD_IP}:2379" --master-key-crypter-method aes128-ctr --master-key "local:///path/to/master.key" ``` + +### Restore data using filters + +Starting from TiDB v8.5.5, you can use filters during PITR to restore specific databases or tables, enabling more fine-grained control over the data to be restored. + +The filter patterns follow the same [table filtering syntax](/table-filter.md) as other BR operations: + +- `'*.*'`: matches all databases and tables. +- `'db1.*'`: matches all tables in the database `db1`. +- `'db1.table1'`: matches the specific table `table1` in the database `db1`. +- `'db*.tbl*'`: matches databases starting with `db` and tables starting with `tbl`. +- `'!mysql.*'`: excludes all tables in the `mysql` database. + +Usage examples: + +```shell +# restore specific databases +tiup br restore point --pd="${PD_IP}:2379" \ +--storage='s3://backup-101/logbackup?access-key=${ACCESS-KEY}&secret-access-key=${SECRET-ACCESS-KEY}' \ +--full-backup-storage='s3://backup-101/snapshot-20250602000000?access-key=${ACCESS-KEY}&secret-access-key=${SECRET-ACCESS-KEY}' \ +--start-ts "2025-06-02 00:00:00+0800" \ +--restored-ts "2025-06-03 18:00:00+0800" \ +--filter 'db1.*' --filter 'db2.*' + +# restore specific tables +tiup br restore point --pd="${PD_IP}:2379" \ +--storage='s3://backup-101/logbackup?access-key=${ACCESS-KEY}&secret-access-key=${SECRET-ACCESS-KEY}' \ +--full-backup-storage='s3://backup-101/snapshot-20250602000000?access-key=${ACCESS-KEY}&secret-access-key=${SECRET-ACCESS-KEY}' \ +--start-ts "2025-06-02 00:00:00+0800" \ +--restored-ts "2025-06-03 18:00:00+0800" \ +--filter 'db1.users' --filter 'db1.orders' + +# restore using pattern matching +tiup br restore point --pd="${PD_IP}:2379" \ +--storage='s3://backup-101/logbackup?access-key=${ACCESS-KEY}&secret-access-key=${SECRET-ACCESS-KEY}' \ +--full-backup-storage='s3://backup-101/snapshot-20250602000000?access-key=${ACCESS-KEY}&secret-access-key=${SECRET-ACCESS-KEY}' \ +--start-ts "2025-06-02 00:00:00+0800" \ +--restored-ts "2025-06-03 18:00:00+0800" \ +--filter 'db*.tbl*' +``` + +> **Note:** +> +> - Before restoring data using filters, ensure that the target cluster does not contain any databases or tables that match the filter. Otherwise, the restore will fail with an error. +> - The filter options apply during the restore phase for both snapshot and log backups. +> - You can specify multiple `--filter` options to include or exclude different patterns. +> - PITR filtering does not support system tables yet. If you need to restore specific system tables, use the `br restore full` command with filters instead. Note that this command restores only the snapshot backup data (not log backup data). +> - The regular expression in the restore task matches the table name at the `restored-ts` time point, with the following three possible cases: +> - Table A (table id = 1): the table name always matches the `--filter` regular expression at and before the `restored-ts` time point. In this case, PITR restores the table. +> - Table B (table id = 2): the table name does not match the `--filter` regular expression at some point before `restored-ts`, but matches at the `restored-ts` time point. In this case, PITR restores the table. +> - Table C (table id = 3): the table name matches the `--filter` regular expression at some point before `restored-ts`, but does **not** match at the `restored-ts` time point. In this case, PITR does **not** restore the table. +> - You can use the database and table filtering feature to restore part of the data online. During the online restore process, do **not** create databases or tables with the same names as the restored objects, otherwise the restore task fails due to conflicts. To avoid data inconsistency, the tables created by PITR during this restore process are not readable or writable until the restore task is complete. + +### Concurrent restore operations + +Starting from TiDB v8.5.5, you can run multiple PITR restore tasks concurrently. This feature allows you to restore different datasets in parallel, improving efficiency for large-scale restore scenarios. + +Usage example for concurrent restores: + +```shell +# terminal 1 - restore database db1 +tiup br restore point --pd="${PD_IP}:2379" \ +--storage='s3://backup-101/logbackup?access-key=${ACCESS-KEY}&secret-access-key=${SECRET-ACCESS-KEY}' \ +--full-backup-storage='s3://backup-101/snapshot-20250602000000?access-key=${ACCESS-KEY}&secret-access-key=${SECRET-ACCESS-KEY}' \ +--start-ts "2025-06-02 00:00:00+0800" \ +--restored-ts "2025-06-03 18:00:00+0800" \ +--filter 'db1.*' + +# terminal 2 - restore database db2 (can run simultaneously) +tiup br restore point --pd="${PD_IP}:2379" \ +--storage='s3://backup-101/logbackup?access-key=${ACCESS-KEY}&secret-access-key=${SECRET-ACCESS-KEY}' \ +--full-backup-storage='s3://backup-101/snapshot-20250602000000?access-key=${ACCESS-KEY}&secret-access-key=${SECRET-ACCESS-KEY}' \ +--start-ts "2025-06-02 00:00:00+0800" \ +--restored-ts "2025-06-03 18:00:00+0800" \ +--filter 'db2.*' +``` + +> **Note:** +> +> - Each concurrent restore operation must target a different database or a non-overlapping set of tables. Attempting to restore overlapping datasets concurrently will result in an error. +> - Multiple restore tasks consume a lot of system resources. It is recommended to run concurrent restore tasks only when CPU and I/O resources are sufficient. + +### Compatibility between ongoing log backup and snapshot restore + +Starting from v8.5.5, when a log backup task is running, if all of the following conditions are met, you can still perform snapshot restore (`br restore [full|database|table]`) and allow the restored data to be properly recorded by the ongoing log backup (hereinafter referred to as "log backup"): + +- The node performing backup and restore operations has the following necessary permissions: + - Read access to the external storage containing the backup source, for snapshot restore + - Write access to the target external storage used by the log backup +- The target external storage for the log backup is Amazon S3 (`s3://`), Google Cloud Storage (`gcs://`), or Azure Blob Storage (`azblob://`). +- The data to be restored uses the same type of external storage as the target storage for the log backup. +- Neither the data to be restored nor the log backup has local encryption enabled. For details, see [log backup encryption](#encrypt-the-log-backup-data) and [snapshot backup encryption](/br/br-snapshot-manual.md#encrypt-the-backup-data). + +If any of the above conditions are not met, you can restore the data by following these steps: + +1. [Stop the log backup task](#stop-a-log-backup-task). +2. Perform the data restore. +3. After the restore is complete, perform a new snapshot backup. +4. [Restart the log backup task](#restart-a-log-backup-task). + +> **Note:** +> +> When restoring a log backup that contains records of snapshot (full) restore data, you must use BR v8.5.5 or later. Otherwise, restoring the recorded full restore data might fail. + +### Compatibility between ongoing log backup and PITR operations + +Starting from TiDB v8.5.5, you can perform PITR operations while a log backup task is running by default. The system automatically handles compatibility between these operations. + +#### Important limitation for PITR with ongoing log backup + +When you perform the PITR operations while a log backup is running, the restored data will also be recorded in the ongoing log backup. However, due to the nature of log restore operations, data inconsistencies might occur within the restore window. The system writes metadata to external storage to mark both the time range and data range where consistency cannot be guaranteed. + +If such inconsistency occurs during the time range `[t1, t2)`, you cannot directly restore data from this period. Instead, choose one of the following alternatives: + +- Restore data up to `t1` (to retrieve data before the inconsistent period). +- Perform a new snapshot backup after `t2`, and use it as the base for future PITR operations. + +### Abort restore operations + +If a restore operation fails, you can use the `tiup br abort` command to clean up registry entries and checkpoint data. This command automatically locates and removes relevant metadata based on the original restore parameters, including entries in the `mysql.tidb_restore_registry` table and checkpoint data (regardless of whether it is stored in a local database or external storage). + +> **Note:** +> +> The `abort` command only cleans up metadata. You need to manually delete any actual restored data from the cluster. + +The following examples show how to abort restore operations using the same parameters as the original restore command: + +```shell +# Abort a PITR operation +tiup br abort restore point --pd="${PD_IP}:2379" \ +--storage='s3://backup-101/logbackup?access-key=${ACCESS-KEY}&secret-access-key=${SECRET-ACCESS-KEY}' \ +--full-backup-storage='s3://backup-101/snapshot-20250602000000?access-key=${ACCESS-KEY}&secret-access-key=${SECRET-ACCESS-KEY}' + +# Abort a PITR operation with filters +tiup br abort restore point --pd="${PD_IP}:2379" \ +--storage='s3://backup-101/logbackup?access-key=${ACCESS-KEY}&secret-access-key=${SECRET-ACCESS-KEY}' \ +--full-backup-storage='s3://backup-101/snapshot-20250602000000?access-key=${ACCESS-KEY}&secret-access-key=${SECRET-ACCESS-KEY}' \ +--filter 'db1.*' + +# Abort a full restore +tiup br abort restore full --pd="${PD_IP}:2379" \ +--storage='s3://backup-101/snapshot-20250602000000?access-key=${ACCESS-KEY}&secret-access-key=${SECRET-ACCESS-KEY}' + +# Abort a database restore +tiup br abort restore db --pd="${PD_IP}:2379" \ +--storage='s3://backup-101/snapshot-20250602000000?access-key=${ACCESS-KEY}&secret-access-key=${SECRET-ACCESS-KEY}' \ +--db database_name + +# Abort a table restore +tiup br abort restore table --pd="${PD_IP}:2379" \ +--storage='s3://backup-101/snapshot-20250602000000?access-key=${ACCESS-KEY}&secret-access-key=${SECRET-ACCESS-KEY}' \ +--db database_name --table table_name +``` diff --git a/br/br-snapshot-guide.md b/br/br-snapshot-guide.md index 80e04a2207e4c..8bed8406dc767 100644 --- a/br/br-snapshot-guide.md +++ b/br/br-snapshot-guide.md @@ -36,10 +36,17 @@ In the preceding command: During backup, a progress bar is displayed in the terminal as shown below. When the progress bar advances to 100%, the backup task is completed and statistics such as total backup time, average backup speed, and backup data size are displayed. +- `total-ranges`: indicates the total number of files to be backed up. +- `ranges-succeed`: indicates the number of files that are successfully backed up. +- `ranges-failed`: indicates the number of files that failed to be backed up. +- `backup-total-ranges`: indicates the number of tables (including partitions) and indexes that are to be backed up. +- `write-CF-files`: indicates the number of backup SST files that contain `write CF` data. +- `default-CF-files`: indicates the number of backup SST files that contain `default CF` data. + ```shell Full Backup <-------------------------------------------------------------------------------> 100.00% Checksum <----------------------------------------------------------------------------------> 100.00% -*** ["Full Backup success summary"] *** [backup-checksum=3.597416ms] [backup-fast-checksum=2.36975ms] *** [total-take=4.715509333s] [BackupTS=435844546560000000] [total-kv=1131] [total-kv-size=250kB] [average-speed=53.02kB/s] [backup-data-size(after-compressed)=71.33kB] [Size=71330] +*** ["Full Backup success summary"] *** [total-ranges=20] [ranges-succeed=20] [ranges-failed=0] [backup-checksum=3.597416ms] [backup-fast-checksum=2.36975ms] [backup-total-ranges=11] [backup-total-regions=10] [write-CF-files=14] [default-CF-files=6] [total-take=4.715509333s] [BackupTS=435844546560000000] [total-kv=1131] [total-kv-size=250kB] [average-speed=53.02kB/s] [backup-data-size(after-compressed)=71.33kB] [Size=71330] ``` ## Get the backup time point of a snapshot backup @@ -78,11 +85,25 @@ tiup br restore full --pd "${PD_IP}:2379" \ During restore, a progress bar is displayed in the terminal as shown below. When the progress bar advances to 100%, the restore task is completed and statistics such as total restore time, average restore speed, and total data size are displayed. +- `total-ranges`: indicates the total number of files that are to be restored. +- `ranges-succeed`: indicates the number of files that are successfully restored. +- `ranges-failed`: indicates the number of files that failed to be restored. +- `merge-ranges`: indicates the time taken to merge the data range. +- `split-region`: indicates the time taken to split and scatter Regions. +- `restore-files`: indicates the time TiKV takes to download and ingest SST files. +- `write-CF-files`: indicates the number of restored SST files that contain `write CF` data. +- `default-CF-files`: indicates the number of restored SST files that contain `default CF` data. +- `split-keys`: indicates the number of keys generated for splitting Regions. + ```shell -Full Restore <------------------------------------------------------------------------------> 100.00% -*** ["Full Restore success summary"] *** [total-take=4.344617542s] [total-kv=5] [total-kv-size=327B] [average-speed=75.27B/s] [restore-data-size(after-compressed)=4.813kB] [Size=4813] [BackupTS=435844901803917314] +Split&Scatter Region <--------------------------------------------------------------------> 100.00% +Download&Ingest SST <---------------------------------------------------------------------> 100.00% +Restore Pipeline <------------------------------------------------------------------------> 100.00% +*** ["Full Restore success summary"] [total-ranges=20] [ranges-succeed=20] [ranges-failed=0] [merge-ranges=7.546971ms] [split-region=343.594072ms] [restore-files=1.57662s] [default-CF-files=6] [write-CF-files=14] [split-keys=9] [total-take=4.344617542s] [total-kv=5] [total-kv-size=327B] [average-speed=75.27B/s] [restore-data-size(after-compressed)=4.813kB] [Size=4813] [BackupTS=435844901803917314] ``` +During data restore, the table mode of the target table is automatically set to `restore`. Tables in `restore` mode do not allow any read or write operations. After data restore is complete, the table mode automatically switches back to `normal`, and you can read and write the table normally. This mechanism ensures task stability and data consistency throughout the restore process. + ### Restore a database or a table BR supports restoring partial data of a specified database or table from backup data. This feature allows you to filter out unwanted data and back up only a specific database or table. @@ -129,6 +150,7 @@ tiup br restore full \ - Starting from BR v5.1.0, when you back up snapshots, BR automatically backs up the **system tables** in the `mysql` schema, but does not restore these system tables by default. - Starting from v6.2.0, BR lets you specify `--with-sys-table` to restore **data in some system tables**. - Starting from v7.6.0, BR enables `--with-sys-table` by default, which means that BR restores **data in some system tables** by default. +- Starting from v8.5.5, BR introduces the `--fast-load-sys-tables` parameter to support physical restore of system tables. This parameter is enabled by default. This approach uses the `RENAME TABLE` DDL statement to atomically swap the system tables in the `__TiDB_BR_Temporary_mysql` database with the system tables in the `mysql` database. Unlike the logical restoration of system tables using the `REPLACE INTO` SQL statement, physical restoration completely overwrites the existing data in the system tables. **BR can restore data in the following system tables:** diff --git a/br/br-snapshot-manual.md b/br/br-snapshot-manual.md index c6805f6ec0342..eb3742c58f2a3 100644 --- a/br/br-snapshot-manual.md +++ b/br/br-snapshot-manual.md @@ -127,8 +127,21 @@ tiup br restore full \ --storage local:///br_data/ --pd "${PD_IP}:2379" --log-file restore.log ``` +> **Note:** +> +> Starting from v8.5.5, when the `--load-stats` parameter is set to `false`, BR no longer writes statistics for the restored tables to the `mysql.stats_meta` table. After the restore is complete, you can manually execute the [`ANALYZE TABLE`](/sql-statements/sql-statement-analyze-table.md) SQL statement to update the relevant statistics. + When the backup and restore feature backs up data, it stores statistics in JSON format within the `backupmeta` file. When restoring data, it loads statistics in JSON format into the cluster. For more information, see [LOAD STATS](/sql-statements/sql-statement-load-stats.md). +Starting from v8.5.5, BR introduces the `--fast-load-sys-tables` parameter, which is enabled by default. When restoring data to a new cluster using the `br` command-line tool, and the IDs of tables and partitions between the upstream and downstream clusters can be reused (otherwise, BR will automatically fall back to logically load statistics), enabling `--fast-load-sys-tables` lets BR to first restore the statistics-related system tables to the temporary system database `__TiDB_BR_Temporary_mysql`, and then atomically swap these tables with the corresponding tables in the `mysql` database using the `RENAME TABLE` statement. + +The following is an example: + +```shell +tiup br restore full \ +--storage local:///br_data/ --pd "${PD_IP}:2379" --log-file restore.log --load-stats --fast-load-sys-tables +``` + ## Encrypt the backup data BR supports encrypting backup data at the backup side and [at the storage side when backing up to Amazon S3](/br/backup-and-restore-storages.md#amazon-s3-server-side-encryption). You can choose either encryption method as required. @@ -176,9 +189,27 @@ In the preceding command: During restore, a progress bar is displayed in the terminal as shown below. When the progress bar advances to 100%, the restore task is completed. Then `br` will verify the restored data to ensure data security. ```shell -Full Restore <---------/...............................................> 17.12%. +Split&Scatter Region <--------------------------------------------------------------------> 100.00% +Download&Ingest SST <---------------------------------------------------------------------> 100.00% +Restore Pipeline <-------------------------/...............................................> 17.12% ``` +Starting from TiDB v8.5.5, BR lets you specify `--fast-load-sys-tables` to restore statistics physically in a new cluster: + +```shell +tiup br restore full \ + --pd "${PD_IP}:2379" \ + --with-sys-table \ + --fast-load-sys-tables \ + --storage "s3://${backup_collection_addr}/snapshot-${date}?access-key=${access-key}&secret-access-key=${secret-access-key}" \ + --ratelimit 128 \ + --log-file restorefull.log +``` + +> **Note:** +> +> Unlike the logical restoration of system tables using the `REPLACE INTO` SQL statement, physical restoration completely overwrites the existing data in the system tables. + ## Restore a database or a table You can use `br` to restore partial data of a specified database or table from backup data. This feature allows you to filter out data that you do not need during the restore. diff --git a/configure-store-limit.md b/configure-store-limit.md index 33419c4ca28f0..c4802ac93e8c1 100644 --- a/configure-store-limit.md +++ b/configure-store-limit.md @@ -53,6 +53,13 @@ tiup ctl:v pd store limit all 5 add-peer // All stores tiup ctl:v pd store limit all 5 remove-peer // All stores can at most delete 5 peers per minute. ``` +Starting from v8.5.5, you can set the speed limit for removing-peer operations for all stores of a specific storage engine type, as shown in the following examples: + +```bash +tiup ctl:v pd store limit all engine tikv 5 remove-peer // All TiKV stores can at most remove 5 peers per minute. +tiup ctl:v pd store limit all engine tiflash 5 remove-peer // All TiFlash stores can at most remove 5 peers per minute. +``` + ### Set limit for a single store To set the speed limit for a single store, run the following commands: diff --git a/faq/backup-and-restore-faq.md b/faq/backup-and-restore-faq.md index 315e30c583a0c..6195e8d81428c 100644 --- a/faq/backup-and-restore-faq.md +++ b/faq/backup-and-restore-faq.md @@ -107,6 +107,14 @@ After you pause a log backup task, to prevent the MVCC data from being garbage c To address this problem, delete the current task using `br log stop`, and then create a log backup task using `br log start`. At the same time, you can perform a full backup for subsequent PITR. +### What should I do if the error message `[ddl:8204]invalid ddl job type: none` is returned when using the PITR table filter? + +```shell +failed to refresh meta for database with schemaID=124, dbName=pitr_test: [ddl:8204]invalid ddl job type: none +``` + +This error occurs because the TiDB node acting as the DDL Owner is running an outdated version that cannot recognize the Refresh Meta DDL. To resolve this issue, upgrade your cluster to v8.5.5 or later before using the PITR [table filter](/table-filter.md) feature. + ## Feature compatibility issues ### Why does data restored using br command-line tool cannot be replicated to the upstream cluster of TiCDC? diff --git a/identify-slow-queries.md b/identify-slow-queries.md index 2015f622c55cf..ce2dc0e10778a 100644 --- a/identify-slow-queries.md +++ b/identify-slow-queries.md @@ -167,6 +167,11 @@ Fields related to Resource Control: * `Request_unit_write`: the total write RUs consumed by the statement. * `Time_queued_by_rc`: the total time that the statement waits for available resources. +Fields related to storage engines: + +- `Storage_from_kv`: introduced in v8.5.5, indicates whether this statement read data from TiKV. +- `Storage_from_mpp`: introduced in v8.5.5, indicates whether this statement read data from TiFlash. + ## Related system variables * [`tidb_slow_log_threshold`](/system-variables.md#tidb_slow_log_threshold): Sets the threshold for the slow log. The SQL statement whose execution time exceeds this threshold is recorded in the slow log. The default value is 300 (ms). diff --git a/information-schema/information-schema-partitions.md b/information-schema/information-schema-partitions.md index 044474113732e..61510949922ab 100644 --- a/information-schema/information-schema-partitions.md +++ b/information-schema/information-schema-partitions.md @@ -45,8 +45,9 @@ The output is as follows: | TABLESPACE_NAME | varchar(64) | YES | | NULL | | | TIDB_PARTITION_ID | bigint(21) | YES | | NULL | | | TIDB_PLACEMENT_POLICY_NAME | varchar(64) | YES | | NULL | | +| TIDB_AFFINITY | varchar(128) | YES | | NULL | | +-------------------------------+--------------+------+------+---------+-------+ -27 rows in set (0.00 sec) +28 rows in set (0.00 sec) ``` ```sql @@ -85,6 +86,7 @@ SUBPARTITION_ORDINAL_POSITION: NULL TABLESPACE_NAME: NULL TIDB_PARTITION_ID: 89 TIDB_PLACEMENT_POLICY_NAME: NULL + TIDB_AFFINITY: NULL *************************** 2. row *************************** TABLE_CATALOG: def TABLE_SCHEMA: test @@ -113,6 +115,7 @@ SUBPARTITION_ORDINAL_POSITION: NULL TABLESPACE_NAME: NULL TIDB_PARTITION_ID: 90 TIDB_PLACEMENT_POLICY_NAME: NULL + TIDB_AFFINITY: NULL 2 rows in set (0.00 sec) ``` diff --git a/information-schema/information-schema-tables.md b/information-schema/information-schema-tables.md index 8ae5df9a46bf1..77eb125c82bde 100644 --- a/information-schema/information-schema-tables.md +++ b/information-schema/information-schema-tables.md @@ -41,8 +41,12 @@ DESC tables; | TABLE_COMMENT | varchar(2048) | YES | | NULL | | | TIDB_TABLE_ID | bigint(21) | YES | | NULL | | | TIDB_ROW_ID_SHARDING_INFO | varchar(255) | YES | | NULL | | +| TIDB_PK_TYPE | varchar(64) | YES | | NULL | | +| TIDB_PLACEMENT_POLICY_NAME | varchar(64) | YES | | NULL | | +| TIDB_TABLE_MODE | varchar(16) | YES | | NULL | | +| TIDB_AFFINITY | varchar(128) | YES | | NULL | | +---------------------------+---------------+------+------+----------+-------+ -23 rows in set (0.00 sec) +27 rows in set (0.00 sec) ``` {{< copyable "sql" >}} @@ -72,10 +76,14 @@ SELECT * FROM tables WHERE table_schema='mysql' AND table_name='user'\G CHECK_TIME: NULL TABLE_COLLATION: utf8mb4_bin CHECKSUM: NULL - CREATE_OPTIONS: - TABLE_COMMENT: + CREATE_OPTIONS: + TABLE_COMMENT: TIDB_TABLE_ID: 5 TIDB_ROW_ID_SHARDING_INFO: NULL + TIDB_PK_TYPE: CLUSTERED +TIDB_PLACEMENT_POLICY_NAME: NULL + TIDB_TABLE_MODE: Normal + TIDB_AFFINITY: NULL 1 row in set (0.00 sec) ``` @@ -115,7 +123,7 @@ The description of columns in the `TABLES` table is as follows: * `CREATE_OPTIONS`: Creates options. * `TABLE_COMMENT`: The comments and notes of the table. -Most of the information in the table is the same as MySQL. Only two columns are newly defined by TiDB: +Most of the information in the table is the same as MySQL. The following columns are newly defined by TiDB: * `TIDB_TABLE_ID`: to indicate the internal ID of a table. This ID is unique in a TiDB cluster. * `TIDB_ROW_ID_SHARDING_INFO`: to indicate the sharding type of a table. The possible values are as follows: @@ -123,4 +131,8 @@ Most of the information in the table is the same as MySQL. Only two columns are - `"NOT_SHARDED(PK_IS_HANDLE)"`: the table that defines an integer Primary Key as its row id is not sharded. - `"PK_AUTO_RANDOM_BITS={bit_number}"`: the table that defines an integer Primary Key as its row id is sharded because the Primary Key is assigned with `AUTO_RANDOM` attribute. - `"SHARD_BITS={bit_number}"`: the table is sharded using `SHARD_ROW_ID_BITS={bit_number}`. - - NULL: the table is a system table or view, and thus cannot be sharded. + - `NULL`: the table is a system table or view, and thus cannot be sharded. +* `TIDB_PK_TYPE`: the primary key type of the table. Possible values include `CLUSTERED` (clustered primary key) and `NONCLUSTERED` (non-clustered primary key). +* `TIDB_PLACEMENT_POLICY_NAME`: the name of the placement policy applied to the table. +* `TIDB_TABLE_MODE`: the mode of the table, for example, `Normal`, `Import`, or `Restore`. +* `TIDB_AFFINITY`: the affinity level of the table. It is `table` for non-partitioned tables, `partition` for partitioned tables, and `NULL` when affinity is not enabled. diff --git a/optimizer-hints.md b/optimizer-hints.md index f4c71e5433903..4aa063f893bbf 100644 --- a/optimizer-hints.md +++ b/optimizer-hints.md @@ -474,6 +474,60 @@ EXPLAIN SELECT /*+ NO_ORDER_INDEX(t, a) */ a FROM t ORDER BY a LIMIT 10; The same as the example of `ORDER_INDEX` hint, the optimizer generates two types of plans for this query: `Limit + IndexScan(keep order: true)` and `TopN + IndexScan(keep order: false)`. When the `NO_ORDER_INDEX` hint is used, the optimizer will choose the latter plan to read the index out of order. +### INDEX_LOOKUP_PUSHDOWN(t1_name, idx1_name [, idx2_name ...]) New in v8.5.5 + +The `INDEX_LOOKUP_PUSHDOWN(t1_name, idx1_name [, idx2_name ...])` hint instructs the optimizer to access the specified table using only the specified indexes and push down the `IndexLookUp` operator to TiKV for execution. + +The following example shows the execution plan generated when using this hint: + +```sql +CREATE TABLE t1(a INT, b INT, KEY(a)); +EXPLAIN SELECT /*+ INDEX_LOOKUP_PUSHDOWN(t1, a) */ a, b FROM t1; +``` + +```sql ++-----------------------------+----------+-----------+----------------------+--------------------------------+ +| id | estRows | task | access object | operator info | ++-----------------------------+----------+-----------+----------------------+--------------------------------+ +| IndexLookUp_7 | 10000.00 | root | | | +| ├─LocalIndexLookUp(Build) | 10000.00 | cop[tikv] | | index handle offsets:[1] | +| │ ├─IndexFullScan_5(Build) | 10000.00 | cop[tikv] | table:t1, index:a(a) | keep order:false, stats:pseudo | +| │ └─TableRowIDScan_8(Probe) | 10000.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo | +| └─TableRowIDScan_6(Probe) | 0.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo | ++-----------------------------+----------+-----------+----------------------+--------------------------------+ +``` + +When you use the `INDEX_LOOKUP_PUSHDOWN` hint, the outermost Build operator on the TiDB side in the original execution plan is replaced with `LocalIndexLookUp` and pushed down to TiKV for execution. While scanning the index, TiKV attempts to perform a table lookup locally to read the corresponding row data. Because the index and row data might be distributed across different Regions, requests pushed down to TiKV might not cover all target rows. As a result, the execution plan still retains the `TableRowIDScan` operator on the TiDB side to fetch rows that are not hit on the TiKV side. + +The `INDEX_LOOKUP_PUSHDOWN` hint currently has the following limitations: + +- Cached tables and temporary tables are not supported. +- Queries using [global indexes](/global-indexes.md) are not supported. +- Queries using [multi-valued indexes](/choose-index.md#use-multi-valued-indexes) are not supported. +- Isolation levels other than `REPEATABLE-READ` are not supported. +- [Follower Read](/follower-read.md) is not supported. +- [Stale Read](/stale-read.md) and [reading historical data using `tidb_snapshot`](/read-historical-data.md) are not supported. +- The pushed-down `LocalIndexLookUp` operator does not support `keep order`. If the execution plan includes an `ORDER BY` based on index columns, the query falls back to a regular `IndexLookUp`. +- The pushed-down `LocalIndexLookUp` operator does not support sending Coprocessor requests in paging mode. +- The pushed-down `LocalIndexLookUp` operator does not support [Coprocessor Cache](/coprocessor-cache.md). + +### NO_INDEX_LOOKUP_PUSHDOWN(t1_name) New in v8.5.5 + +The `NO_INDEX_LOOKUP_PUSHDOWN(t1_name)` hint explicitly disables the `IndexLookUp` pushdown for a specified table. This hint is typically used with the [`tidb_index_lookup_pushdown_policy`](/system-variables.md#tidb_index_lookup_pushdown_policy-new-in-v855) system variable. When the value of this variable is `force` or `affinity-force`, you can use this hint to prevent `IndexLookUp` pushdown for specific tables. + +The following example sets the `tidb_index_lookup_pushdown_policy` variable to `force`, which automatically enables pushdown for all `IndexLookUp` operators in the current session. If you specify the `NO_INDEX_LOOKUP_PUSHDOWN` hint in a query, `IndexLookUp` is not pushed down for the corresponding table: + +```sql +SET @@tidb_index_lookup_pushdown_policy = 'force'; + +-- The IndexLookUp operator will not be pushed down. +SELECT /*+ NO_INDEX_LOOKUP_PUSHDOWN(t) */ * FROM t WHERE a > 1; +``` + +> **Note:** +> +> `NO_INDEX_LOOKUP_PUSHDOWN` takes precedence over [`INDEX_LOOKUP_PUSHDOWN`](#index_lookup_pushdownt1_name-idx1_name--idx2_name--new-in-v855). When you specify both hints in the same query, `NO_INDEX_LOOKUP_PUSHDOWN` takes effect. + ### AGG_TO_COP() The `AGG_TO_COP()` hint tells the optimizer to push down the aggregate operation in the specified query block to the coprocessor. If the optimizer does not push down some aggregate function that is suitable for pushdown, then it is recommended to use this hint. For example: diff --git a/pd-configuration-file.md b/pd-configuration-file.md index f0bb6f1822f5b..6a9beacb34213 100644 --- a/pd-configuration-file.md +++ b/pd-configuration-file.md @@ -292,6 +292,13 @@ Configuration items related to scheduling + Specifies the upper limit of the `Region Merge` key. When the Region key is greater than the specified value, the PD does not merge the Region with its adjacent Regions. + Default value: `540000`. Before v8.4.0, the default value is `200000`. Starting from v8.4.0, the default value is `540000`. +### `max-affinity-merge-region-size` New in v8.5.5 + ++ Controls the threshold for automatically merging small adjacent Regions that belong to the same [affinity](/table-affinity.md) group. When a Region belongs to an affinity group and its size is smaller than this threshold, PD attempts to merge this Region with other small adjacent Regions in the same affinity group to reduce the number of Regions and maintain the affinity effect. ++ Setting it to `0` disables the automatic merging of small adjacent Regions within an affinity group. ++ Default value: `256` ++ Unit: MiB + ### `patrol-region-interval` + Controls the running frequency at which the checker inspects the health state of a Region. The smaller this value is, the faster the checker runs. Normally, you do not need to adjust this configuration. @@ -372,6 +379,11 @@ Configuration items related to scheduling + The number of the `Region Merge` scheduling tasks performed at the same time. Set this parameter to `0` to disable `Region Merge`. + Default value: `8` +### `affinity-schedule-limit` New in v8.5.5 + ++ Controls the number of [affinity](/table-affinity.md) scheduling tasks that can be performed concurrently. Setting it to `0` disables affinity scheduling. ++ Default value: `0` + ### `high-space-ratio` + The threshold ratio below which the capacity of the store is sufficient. If the space occupancy ratio of the store is smaller than this threshold value, PD ignores the remaining space of the store when performing scheduling, and balances load mainly based on the Region size. This configuration takes effect only when `region-score-formula-version` is set to `v1`. diff --git a/pd-control.md b/pd-control.md index 82bcd7fc6123c..fe6a12d648e7d 100644 --- a/pd-control.md +++ b/pd-control.md @@ -940,7 +940,7 @@ Usage: >> scheduler config evict-leader-scheduler // Display the stores in which the scheduler is located since v4.0.0 >> scheduler config evict-leader-scheduler add-store 2 // Add leader eviction scheduling for store 2 >> scheduler config evict-leader-scheduler delete-store 2 // Remove leader eviction scheduling for store 2 ->> scheduler add evict-slow-store-scheduler // When there is one and only one slow store, evict all Region leaders of that store +>> scheduler add evict-slow-store-scheduler // Automatically detect slow-disk or slow-network nodes and evict all Region leaders from those nodes when specific conditions are met >> scheduler remove grant-leader-scheduler-1 // Remove the corresponding scheduler, and `-1` corresponds to the store ID >> scheduler pause balance-region-scheduler 10 // Pause the balance-region scheduler for 10 seconds >> scheduler pause all 10 // Pause all schedulers for 10 seconds @@ -964,6 +964,44 @@ The state of the scheduler can be one of the following: - `pending`: the scheduler cannot generate scheduling operators. For a scheduler in the `pending` state, brief diagnostic information is returned. The brief information describes the state of stores and explains why these stores cannot be selected for scheduling. - `normal`: there is no need to generate scheduling operators. +### `scheduler config evict-slow-store-scheduler` + +The `evict-slow-store-scheduler` limits PD from scheduling Leaders to abnormal TiKV nodes and actively evicts Leaders when necessary, thereby reducing the impact of slow nodes on the cluster when TiKV nodes experience disk I/O or network jitter. + +#### Slow-disk nodes + +Starting from v6.2.0, TiKV reports a `SlowScore` in store heartbeats to PD. This score is calculated based on disk I/O conditions and ranges from 1 to 100. A higher value indicates a higher possibility of disk performance anomalies on that node. + +For slow-disk nodes, the detection on TiKV and the scheduling via `evict-slow-store-scheduler` on PD are enabled by default, which means no additional configuration is required. + +#### Slow-network nodes + +Starting from v8.5.5, TiKV supports reporting a `NetworkSlowScore` in store heartbeats to PD. It is calculated based on network detection results and helps identify slow nodes experiencing network jitter. The score ranges from 1 to 100, where a higher value indicates a higher possibility of network anomalies. + +For compatibility and resource consumption considerations, the detection and scheduling of slow-network nodes are disabled by default. To enable them, configure both of the following: + +1. Enable the PD scheduler to handle slow-network nodes: + + ```bash + scheduler config evict-slow-store-scheduler set enable-network-slow-store true + ``` + +2. On TiKV, set the [`raftstore.inspect-network-interval`](/tikv-configuration-file.md#inspect-network-interval-new-in-v855) configuration item to a value greater than `0` to enable network detection. + +#### Recovery time control + +You can specify how long a slow node must remain stable before it is considered recovered by using the `recovery-duration` parameter. + +Example: + +```bash +>> scheduler config evict-slow-store-scheduler +{ + "recovery-duration": "1800" // 30 minutes +} +>> scheduler config evict-slow-store-scheduler set recovery-duration 600 +``` + ### `scheduler config balance-leader-scheduler` Use this command to view and control the `balance-leader-scheduler` policy. @@ -1235,15 +1273,17 @@ store weight 1 5 10 You can set the scheduling speed of stores by using `store limit`. For more details about the principles and usage of `store limit`, see [`store limit`](/configure-store-limit.md). ```bash ->> store limit // Show the speed limit of adding-peer operations and the limit of removing-peer operations per minute in all stores ->> store limit add-peer // Show the speed limit of adding-peer operations per minute in all stores ->> store limit remove-peer // Show the limit of removing-peer operations per minute in all stores ->> store limit all 5 // Set the limit of adding-peer operations to 5 and the limit of removing-peer operations to 5 per minute for all stores ->> store limit 1 5 // Set the limit of adding-peer operations to 5 and the limit of removing-peer operations to 5 per minute for store 1 ->> store limit all 5 add-peer // Set the limit of adding-peer operations to 5 per minute for all stores ->> store limit 1 5 add-peer // Set the limit of adding-peer operations to 5 per minute for store 1 ->> store limit 1 5 remove-peer // Set the limit of removing-peer operations to 5 per minute for store 1 ->> store limit all 5 remove-peer // Set the limit of removing-peer operations to 5 per minute for all stores +>> store limit // Show the speed limit of adding-peer operations and the limit of removing-peer operations per minute in all stores +>> store limit add-peer // Show the speed limit of adding-peer operations per minute in all stores +>> store limit remove-peer // Show the limit of removing-peer operations per minute in all stores +>> store limit all 5 // Set the limit of adding-peer operations to 5 and the limit of removing-peer operations to 5 per minute for all stores +>> store limit 1 5 // Set the limit of adding-peer operations to 5 and the limit of removing-peer operations to 5 per minute for store 1 +>> store limit all 5 add-peer // Set the limit of adding-peer operations to 5 per minute for all stores +>> store limit 1 5 add-peer // Set the limit of adding-peer operations to 5 per minute for store 1 +>> store limit 1 5 remove-peer // Set the limit of removing-peer operations to 5 per minute for store 1 +>> store limit all 5 remove-peer // Set the limit of removing-peer operations to 5 per minute for all stores +>> store limit all engine tikv 5 remove-peer // Starting from v8.5.5, you can set the speed limit of removing-peer operations for all TiKV stores. This example sets the speed limit of removing-peer operations for all TiKV stores to 5 per minute. +>> store limit all engine tiflash 5 remove-peer // Starting from v8.5.5, you can set the speed limit of removing-peer operations for all TiFlash stores. This example sets the speed limit of removing-peer operations for all TiFlash stores to 5 per minute. ``` > **Note:** diff --git a/sql-statements/sql-statement-alter-table.md b/sql-statements/sql-statement-alter-table.md index 1a762291e1bc4..2ec218bd130dd 100644 --- a/sql-statements/sql-statement-alter-table.md +++ b/sql-statements/sql-statement-alter-table.md @@ -54,6 +54,7 @@ AlterTableSpec ::= | TTLEnable EqOpt ( 'ON' | 'OFF' ) | TTLJobInterval EqOpt stringLit ) +| 'AFFINITY' EqOpt stringLit | PlacementPolicyOption PlacementPolicyOption ::= @@ -181,6 +182,8 @@ The following major restrictions apply to `ALTER TABLE` in TiDB: - Changes of some data types (for example, some TIME, Bit, Set, Enum, and JSON types) are not supported due to the compatibility issues of the `CAST` function's behavior between TiDB and MySQL. +- The `AFFINITY` option is a TiDB extension syntax. After `AFFINITY` is enabled for a table, you cannot modify the partition scheme of that table, such as adding, dropping, reorganizing, or swapping partitions. To modify the partition scheme, you must first remove `AFFINITY`. + - Spatial data types are not supported. - `ALTER TABLE t CACHE | NOCACHE` is a TiDB extension to MySQL syntax. For details, see [Cached Tables](/cached-tables.md). diff --git a/sql-statements/sql-statement-create-table.md b/sql-statements/sql-statement-create-table.md index 9029ac4d9bc3b..dd3e775e6ca63 100644 --- a/sql-statements/sql-statement-create-table.md +++ b/sql-statements/sql-statement-create-table.md @@ -117,6 +117,7 @@ TableOption ::= | 'UNION' EqOpt '(' TableNameListOpt ')' | 'ENCRYPTION' EqOpt EncryptionOpt | 'TTL' EqOpt TimeColumnName '+' 'INTERVAL' Expression TimeUnit (TTLEnable EqOpt ( 'ON' | 'OFF' ))? (TTLJobInterval EqOpt stringLit)? +| 'AFFINITY' EqOpt StringName | PlacementPolicyOption OnCommitOpt ::= @@ -168,13 +169,16 @@ The following *table_options* are supported. Other options such as `AVG_ROW_LENG |`AUTO_ID_CACHE`| To set the auto ID cache size in a TiDB instance. By default, TiDB automatically changes this size according to allocation speed of auto ID |`AUTO_ID_CACHE` = 200 | |`AUTO_RANDOM_BASE`| To set the initial incremental part value of auto_random. This option can be considered as a part of the internal interface. Users can ignore this parameter |`AUTO_RANDOM_BASE` = 0| | `CHARACTER SET` | To specify the [character set](/character-set-and-collation.md) for the table | `CHARACTER SET` = 'utf8mb4' | +| `COLLATE` | To specify the character set collation for the table | `COLLATE` = 'utf8mb4_bin' | | `COMMENT` | The comment information | `COMMENT` = 'comment info' | +| `AFFINITY` | To enable affinity scheduling for a table or partition. It can be set to `'table'` for non-partitioned tables and `'partition'` for partitioned tables. Setting it to `'none'` or leaving it empty disables affinity scheduling. | `AFFINITY` = 'table' | > **Note:** > -> The `split-table` configuration option is enabled by default. When it is enabled, a separate Region is created for each newly created table. For details, see [TiDB configuration file](/tidb-configuration-file.md). +> - The `split-table` configuration option is enabled by default. When it is enabled, a separate Region is created for each newly created table. For details, see [TiDB configuration file](/tidb-configuration-file.md). +> - Before using `AFFINITY`, note that modifying the partitioning scheme (such as adding, dropping, reorganizing, or swapping partitions) of a table with affinity enabled is not supported, and configuring `AFFINITY` on temporary tables or views is not supported. @@ -182,7 +186,8 @@ The following *table_options* are supported. Other options such as `AVG_ROW_LENG > **Note:** > -> TiDB creates a separate Region for each newly created table. +> - TiDB creates a separate Region for each newly created table. +> - Before using `AFFINITY`, note that modifying the partitioning scheme (such as adding, dropping, reorganizing, or swapping partitions) of a table with affinity enabled is not supported, and configuring `AFFINITY` on temporary tables or views is not supported. diff --git a/sql-statements/sql-statement-show-affinity.md b/sql-statements/sql-statement-show-affinity.md new file mode 100644 index 0000000000000..f9dabf5c4a59c --- /dev/null +++ b/sql-statements/sql-statement-show-affinity.md @@ -0,0 +1,61 @@ +--- +title: SHOW AFFINITY +summary: An overview of the usage of SHOW AFFINITY for the TiDB database. +--- + +# SHOW AFFINITY New in v8.5.5 + +The `SHOW AFFINITY` statement shows [affinity](/table-affinity.md) scheduling information for tables configured with the `AFFINITY` option, as well as the target replica distribution currently recorded by PD. + +## Synopsis + +```ebnf+diagram +ShowAffinityStmt ::= + "SHOW" "AFFINITY" ShowLikeOrWhereOpt +``` + +`SHOW AFFINITY` supports filtering table names using `LIKE` or `WHERE` clauses. + +## Examples + +The following examples create two tables with affinity scheduling enabled and show how to view their scheduling information: + +```sql +CREATE TABLE t1 (a INT) AFFINITY = 'table'; +CREATE TABLE tp1 (a INT) AFFINITY = 'partition' PARTITION BY HASH(a) PARTITIONS 2; + +SHOW AFFINITY; +``` + +The example output is as follows: + +```sql ++---------+------------+----------------+-----------------+------------------+----------+--------------+----------------------+ +| Db_name | Table_name | Partition_name | Leader_store_id | Voter_store_ids | Status | Region_count | Affinity_region_count| ++---------+------------+----------------+-----------------+------------------+----------+--------------+----------------------+ +| test | t1 | NULL | 1 | 1,2,3 | Stable | 8 | 8 | +| test | tp1 | p0 | 4 | 4,5,6 | Preparing| 4 | 2 | +| test | tp1 | p1 | 4 | 4,5,6 | Preparing| 3 | 2 | ++---------+------------+----------------+-----------------+------------------+----------+--------------+----------------------+ +``` + +The meaning of each column is as follows: + +- `Leader_store_id`, `Voter_store_ids`: the IDs of TiKV stores recorded by PD, indicating which stores host the target Leader and Voter replicas for the table or partitions. If the target replica locations for the affinity group are not determined, or if [`schedule.affinity-schedule-limit`](/pd-configuration-file.md#affinity-schedule-limit-new-in-v855) is set to `0`, the value is displayed as `NULL`. +- `Status`: indicates the current status of affinity scheduling. Possible values are: + - `Pending`: PD has not started affinity scheduling for the table or partition, such as when Leaders or Voters are not yet determined. + - `Preparing`: PD is scheduling Regions to meet affinity requirements. + - `Stable`: all Regions have reached the target distribution. +- `Region_count`: the current number of Regions in the affinity group. +- `Affinity_region_count`: the number of Regions that currently meet the affinity replica distribution requirements. + - When `Affinity_region_count` is less than `Region_count`, it indicates that some Regions have not yet completed replica scheduling based on affinity. + - When `Affinity_region_count` equals `Region_count`, it indicates that replica scheduling based on affinity is complete, meaning the distribution of all related Regions meets the affinity requirements. However, this does not indicate that related Region merge operations are complete. + +## MySQL compatibility + +This statement is a TiDB extension to MySQL syntax. + +## See also + +- [`CREATE TABLE`](/sql-statements/sql-statement-create-table.md) +- [`ALTER TABLE`](/sql-statements/sql-statement-alter-table.md) \ No newline at end of file diff --git a/statement-summary-tables.md b/statement-summary-tables.md index ca7ff75e0a609..80d4a0d0b6d01 100644 --- a/statement-summary-tables.md +++ b/statement-summary-tables.md @@ -455,6 +455,11 @@ Fields related to Resource Control: - `MAX_QUEUED_RC_TIME`: the maximum waiting time for available RU when executing SQL statements. - `RESOURCE_GROUP`: the resource group bound to SQL statements. +Fields related to storage engines: + +- `STORAGE_KV`: introduced in v8.5.5, indicates whether the previous execution of SQL statements of this category read data from TiKV. +- `STORAGE_MPP`: introduced in v8.5.5, indicates whether the previous execution of SQL statements of this category read data from TiFlash. + ### `statements_summary_evicted` fields description - `BEGIN_TIME`: Records the starting time. diff --git a/system-variables.md b/system-variables.md index 2c3b95d31d929..d892f34eebbd4 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1012,6 +1012,16 @@ mysql> SHOW GLOBAL VARIABLES LIKE 'max_prepared_stmt_count'; - Unit: Bytes - This variable is used to control the threshold at which the TiDB server prefers to send read requests to a replica in the same availability zone as the TiDB server when [`tidb_replica_read`](#tidb_replica_read-new-in-v40) is set to `closest-adaptive`. If the estimated result is higher than or equal to this threshold, TiDB prefers to send read requests to a replica in the same availability zone. Otherwise, TiDB sends read requests to the leader replica. +### tidb_advancer_check_point_lag_limit New in v8.5.5 + +- Scope: GLOBAL +- Persists to cluster: Yes +- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No +- Type: Duration +- Default value: `48h0m0s` +- Range: `[1s, 8760h0m0s]` +- This variable controls the maximum allowed checkpoint lag for a log backup task. If a task's checkpoint lag exceeds this limit, TiDB Advancer pauses the task. + ### tidb_allow_tiflash_cop New in v7.3.0 - Scope: SESSION | GLOBAL @@ -3523,6 +3533,19 @@ For a system upgraded to v5.0 from an earlier version, if you have not modified - This variable is used to set the concurrency of the `index lookup join` algorithm. - A value of `-1` means that the value of `tidb_executor_concurrency` will be used instead. +### tidb_index_lookup_pushdown_policy New in v8.5.5 + +- Scope: SESSION | GLOBAL +- Persists to cluster: Yes +- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): Yes +- Type: Enumeration +- Default value: `hint-only` +- Value options: `hint-only`, `affinity-force`, `force` +- This variable controls whether and when TiDB pushes the `IndexLookUp` operator down to TiKV. The value options are as follows: + - `hint-only` (default): TiDB pushes the `IndexLookUp` operator down to TiKV only when the [`INDEX_LOOKUP_PUSHDOWN`](/optimizer-hints.md#index_lookup_pushdownt1_name-idx1_name--idx2_name--new-in-v855) hint is explicitly specified in the SQL statement. + - `affinity-force`: TiDB automatically enables pushdown only for tables that are configured with the `AFFINITY` option. + - `force`: TiDB enables `IndexLookUp` pushdown for all tables. + ### tidb_index_merge_intersection_concurrency New in v6.5.0 - Scope: SESSION | GLOBAL @@ -6320,6 +6343,15 @@ For details, see [Identify Slow Queries](/identify-slow-queries.md). > - `PARALLEL` and `PARALLEL-FAST` modes are incompatible with [`tidb_tso_client_batch_max_wait_time`](#tidb_tso_client_batch_max_wait_time-new-in-v530) and [`tidb_enable_tso_follower_proxy`](#tidb_enable_tso_follower_proxy-new-in-v530). If either [`tidb_tso_client_batch_max_wait_time`](#tidb_tso_client_batch_max_wait_time-new-in-v530) is set to a non-zero value or [`tidb_enable_tso_follower_proxy`](#tidb_enable_tso_follower_proxy-new-in-v530) is enabled, configuring `tidb_tso_client_rpc_mode` does not take effect, and TiDB always works in `DEFAULT` mode. > - `PARALLEL` and `PARALLEL-FAST` modes are designed to reduce the average time for retrieving TS in TiDB. In situations with significant latency fluctuations, such as long-tail latency or latency spikes, these two modes might not provide any remarkable performance improvements. +### tidb_cb_pd_metadata_error_rate_threshold_ratio New in v8.5.5 + +- Scope: GLOBAL +- Persists to cluster: Yes +- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No +- Default value: `0` +- Range: `[0, 1]` +- This variable controls when TiDB triggers the circuit breaker. Setting a value of `0` (default) disables the circuit breaker. Setting a value between `0.01` and `1` enables it, causing the circuit breaker to trigger when the error rate of specific requests sent to PD reaches or exceeds the threshold. + ### tidb_ttl_delete_rate_limit New in v6.5.0 > **Note:** diff --git a/table-affinity.md b/table-affinity.md new file mode 100644 index 0000000000000..843d7ca7665c9 --- /dev/null +++ b/table-affinity.md @@ -0,0 +1,109 @@ +--- +title: Table-Level Data Affinity +summary: Learn how to configure affinity constraints for tables or partitions to control Region replica distribution and how to view the scheduling status. +--- + +# Table-Level Data Affinity New in v8.5.5 + +> **Warning:** +> +> This feature is experimental. It is not recommended that you use it in the production environment. It might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. + +Table-level data affinity is a PD mechanism for scheduling data distribution at the table level. This mechanism controls how Leader and Voter replicas for Regions of the same table or partition are distributed across a TiKV cluster. + +When you enable PD affinity scheduling and set the `AFFINITY` option of a table to `table` or `partition`, PD groups Regions belonging to the same table or partition into the same affinity group. During scheduling, PD prioritizes placing the Leader and Voter replicas of these Regions on the same subset of a few TiKV nodes. This reduces network latency caused by cross-node access during queries, thereby improving query performance. + +## Limitations + +Before using table-level data affinity, note the following limitations: + +- This feature does not take effect in [PD Microservices Mode](/pd-microservices.md). +- This feature does not work with [Temporary tables](/temporary-tables.md) and [views](/views.md). +- After data affinity is configured for a [partitioned table](/partitioned-table.md), **modifying the table partitioning scheme is not supported**, including adding, dropping, reorganizing, or swapping partitions. To change the partitioning scheme, you must first remove the affinity configuration for that table. +- **Evaluate disk capacity in advance for large data volumes**: after affinity is enabled, PD prioritizes scheduling Regions of a table or partition to the same subset of a few TiKV nodes. For tables or partitions with large data volumes, this might significantly increase disk usage on these nodes. It is recommended to evaluate disk capacity and monitor it in advance. +- Data affinity affects only the distribution of Leader and Voter replicas. If a table has Learner replicas (such as TiFlash), their distribution is not affected by affinity settings. + +## Prerequisites + +PD affinity scheduling is disabled by default. Before setting affinity for tables or partitions, you must enable and configure this feature. + +1. Set the PD configuration item [`schedule.affinity-schedule-limit`](/pd-configuration-file.md#affinity-schedule-limit-new-in-v855) to a value greater than `0` to enable affinity scheduling. + + For example, the following command sets the value to `4`, allowing PD to run up to four affinity scheduling tasks concurrently: + + ```bash + pd-ctl config set schedule.affinity-schedule-limit 4 + ``` + +2. (Optional) Modify the PD configuration item [`schedule.max-affinity-merge-region-size`](/pd-configuration-file.md#max-affinity-merge-region-size-new-in-v855) as needed. The default value is `256` MiB. It controls the size threshold for automatically merging adjacent small Regions within the same affinity group. Setting it to `0` disables the automatic merging of adjacent small Regions within affinity groups. + +## Usage + +This section describes how to configure affinity for tables or partitions and how to view affinity scheduling status. + +### Configure table or partition affinity + +You can configure table or partition affinity using the `AFFINITY` option in `CREATE TABLE` or `ALTER TABLE` statements. + +| Affinity level | Scope | Effect | +|---|---|---| +| `AFFINITY='table'` | Non-partitioned table | Enables affinity for the table. PD creates a single affinity group for all Regions of the table. | +| `AFFINITY='partition'` | Partitioned table | Enables affinity for each partition in the table. PD creates a separate affinity group for the Regions of each partition. For example, for a table with four partitions, PD creates four independent affinity groups. | +| `AFFINITY=''` or `AFFINITY='none'` | Tables configured with `AFFINITY='table'` or `AFFINITY='partition'` | Disables affinity for the table or partitions. When you disable affinity, PD deletes the corresponding affinity group for the target table or partition, so Regions of that table or partition are no longer subject to affinity scheduling constraints. Automatic Region splitting in TiKV reverts to the default behavior within a maximum of 10 minutes. | + +**Examples** + +Enable affinity when creating a non-partitioned table: + +```sql +CREATE TABLE t1 (a INT) AFFINITY = 'table'; +``` + +Enable affinity for each partition when creating a partitioned table: + +```sql +CREATE TABLE tp1 (a INT) + AFFINITY = 'partition' + PARTITION BY HASH(a) PARTITIONS 4; +``` + +Enable affinity for an existing non-partitioned table: + +```sql +CREATE TABLE t2 (a INT); +ALTER TABLE t2 AFFINITY = 'table'; +``` + +Disable table affinity: + +```sql +ALTER TABLE t1 AFFINITY = ''; +``` + +### View affinity information + +You can view table or partition affinity information in the following ways: + +- Execute the [`SHOW AFFINITY`](/sql-statements/sql-statement-show-affinity.md) statement. In the `Status` column, you can view tables or partitions with affinity enabled and their scheduling status. The meanings of the values in the `Status` column are as follows: + + - `Pending`: PD has not started affinity scheduling for the table or partition, such as when Leaders or Voters are not yet determined. + - `Preparing`: PD is scheduling Regions to meet affinity requirements. + - `Stable`: all Regions have reached the target distribution. + +- Query the [`INFORMATION_SCHEMA.TABLES`](/information-schema/information-schema-tables.md) table and check the `TIDB_AFFINITY` column for the affinity level of a table. +- Query the [`INFORMATION_SCHEMA.PARTITIONS`](/information-schema/information-schema-partitions.md) table and check the `TIDB_AFFINITY` column for the affinity level of a partition. + +## Notes + +- **Automatic splitting of Regions**: when a Region belongs to an affinity group and affinity is in effect, automatic splitting of that Region is disabled by default to avoid the creation of too many Regions that could weaken the affinity effect. Automatic splitting is triggered only when the Region size exceeds four times the value of [`schedule.max-affinity-merge-region-size`](/pd-configuration-file.md#max-affinity-merge-region-size-new-in-v855). Note that splits triggered by components other than TiKV or PD (such as manual splits triggered by [`SPLIT TABLE`](/sql-statements/sql-statement-split-region.md)) are not subject to this restriction. + +- **Degradation and expiration mechanism**: if the TiKV nodes hosting the target Leaders or Voters in an affinity group become unavailable (for example, due to node failure or insufficient disk space), if a Leader is evicted, or if there is a conflict with existing placement rules, PD marks the affinity group as degraded. During degradation, affinity scheduling for the corresponding table or partition is paused. + + - If the affected nodes recover within 10 minutes, PD resumes scheduling based on the original affinity settings. + - If the affected nodes do not recover within 10 minutes, the affinity group is marked as expired. At this point, PD restores normal scheduling behavior (the status in [`SHOW AFFINITY`](/sql-statements/sql-statement-show-affinity.md) returns to `Pending`), and automatically updates Leaders and Voters in the affinity group to re-enable affinity scheduling. + +## Related statements and configurations + +- `AFFINITY` option in [`CREATE TABLE`](/sql-statements/sql-statement-create-table.md) and [`ALTER TABLE`](/sql-statements/sql-statement-alter-table.md) +- [`SHOW AFFINITY`](/sql-statements/sql-statement-show-affinity.md) +- PD configuration items: [`schedule.affinity-schedule-limit`](/pd-configuration-file.md#affinity-schedule-limit-new-in-v855) and [`schedule.max-affinity-merge-region-size`](/pd-configuration-file.md#max-affinity-merge-region-size-new-in-v855) \ No newline at end of file diff --git a/tidb-configuration-file.md b/tidb-configuration-file.md index 43a8acdd87809..1ed50cd4abb6d 100644 --- a/tidb-configuration-file.md +++ b/tidb-configuration-file.md @@ -637,6 +637,11 @@ Configuration items related to performance. + When the value of `force-init-stats` is `true`, TiDB needs to wait until statistics initialization is finished before providing services upon startup. Note that if there are a large number of tables and partitions and the value of [`lite-init-stats`](/tidb-configuration-file.md#lite-init-stats-new-in-v710) is `false`, setting `force-init-stats` to `true` might prolong the time it takes for TiDB to start providing services. + When the value of `force-init-stats` is `false`, TiDB can still provide services before statistics initialization is finished, but the optimizer uses pseudo statistics to make decisions, which might result in suboptimal execution plans. +### `enable-async-batch-get` New in v8.5.5 + ++ Controls whether TiDB uses asynchronous mode to execute the Batch Get operator. Using asynchronous mode can reduce goroutine overhead and provide better performance. Generally, there is no need to modify this configuration item. ++ Default value: `false` + ## opentracing Configuration items related to opentracing. diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index a2ea81b9af2f6..b0996424d7541 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -205,6 +205,14 @@ This document only describes the parameters that are not included in command-lin + Default value: `"3s"` + Minimum value: `"1s"` +### `graceful-shutdown-timeout` New in v8.5.5 + ++ Specifies the timeout duration for TiKV graceful shutdown. + + When this value is greater than `0s`, TiKV attempts to transfer all leaders on this node to other TiKV nodes within the specified timeout before shutting down. If there are still leaders that have not been transferred when the timeout is reached, TiKV skips the remaining leader transfers and proceeds directly to the shutdown process. + + When this value is `0s`, TiKV graceful shutdown is disabled. ++ Default value: `"20s"` ++ Minimum value: `"0s"` + ### `concurrent-send-snap-limit` + The maximum number of snapshots sent at the same time @@ -288,6 +296,13 @@ This document only describes the parameters that are not included in command-lin + Sets the size of the connection pool for service and forwarding requests to the server. Setting it to too small a value affects the request latency and load balancing. + Default value: `4` +### `inspect-network-interval` New in v8.5.5 + ++ Controls the interval at which the TiKV HealthChecker actively performs network detection to PD and other TiKV nodes. TiKV calculates a `NetworkSlowScore` based on the network detection results and reports the network status of slow nodes to PD. ++ Setting this value to `0` disables the network detection. Setting it to a smaller value increases the detection frequency, which helps detect network jitter more quickly, but it also consumes more network bandwidth and CPU resources. ++ Default value: `100ms` ++ Value range: `0` or `[10ms, +∞)` + ## readpool.unified Configuration items related to the single thread pool serving read requests. This thread pool supersedes the original storage thread pool and coprocessor thread pool since the 4.0 version. @@ -327,6 +342,19 @@ Configuration items related to the single thread pool serving read requests. Thi + Controls whether to automatically adjust the thread pool size. When it is enabled, the read performance of TiKV is optimized by automatically adjusting the UnifyReadPool thread pool size based on the current CPU usage. The possible range of the thread pool is `[max-thread-count, MAX(4, CPU)]`. The maximum value is the same as the one of [`max-thread-count`](#max-thread-count). + Default value: `false` +### `cpu-threshold` New in v8.5.5 + ++ Specifies the CPU utilization threshold for the unified read pool. For example, if you set this value to `0.8`, the thread pool can use up to 80% of the CPU. + + + By default (when it is `0.0`), there is no limit on the CPU usage of the unified read pool. The size of the thread pool is determined solely by the busy thread scaling algorithm, which adjusts the size dynamically based on the number of threads handling current tasks. + + If it is set to a value greater than `0.0`, TiKV applies the following CPU usage threshold constraints in addition to the existing busy-thread scaling algorithm to control CPU resource usage more strictly: + + Forced scale-down: when the CPU usage of the unified read pool exceeds the configured value plus a 10% buffer, TiKV forcibly reduces the size of the pool. + + Scale-up prevention: when expanding the unified read pool would cause CPU usage to exceed the configured threshold minus a 10% buffer, TiKV prevents the unified read pool from further expanding. + ++ This feature takes effect only when [`readpool.unified.auto-adjust-pool-size`](#auto-adjust-pool-size-new-in-v630) is set to `true`. ++ Default value: `0.0` ++ Value range: `[0.0, 1.0]` + ## readpool.storage Configuration items related to storage thread pool. diff --git a/upgrade-tidb-using-tiup.md b/upgrade-tidb-using-tiup.md index 73c1e32f9f489..20be62fe6c136 100644 --- a/upgrade-tidb-using-tiup.md +++ b/upgrade-tidb-using-tiup.md @@ -69,6 +69,7 @@ The following provides release notes you need to know when you upgrade from v8.4 - TiDB v8.5.2 [release notes](/releases/release-8.5.2.md) - TiDB v8.5.3 [compatibility changes](/releases/release-8.5.3.md#compatibility-changes) - TiDB v8.5.4 [compatibility changes](/releases/release-8.5.4.md#compatibility-changes) +- TiDB v8.5.5 [compatibility changes](https://docs.pingcap.com/tidb/v8.5/release-8.5.5/#compatibility-changes) ### Step 2: Upgrade TiUP or TiUP offline mirror diff --git a/variables.json b/variables.json index 7ed8eb31aa759..28fe30c11607c 100644 --- a/variables.json +++ b/variables.json @@ -1,7 +1,7 @@ { "tidb": "TiDB", - "tidb-version": "8.5.4", - "tidb-release-date": "2025-11-27", + "tidb-version": "8.5.5", + "tidb-release-date": "2026-01-15", "self-managed": "TiDB Self-Managed", "starter": "TiDB Cloud Starter", "essential": "TiDB Cloud Essential",