databendlabs
diff --git a/‎docs/cn/guides/20-cloud/10-using-databend-cloud/01-warehouses.md‎
Lines changed: 38 additions & 31 deletions b/‎docs/cn/guides/20-cloud/10-using-databend-cloud/01-warehouses.md‎
Lines changed: 38 additions & 31 deletions
diff --git a/‎docs/en/guides/20-cloud/10-using-databend-cloud/01-warehouses.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/en/guides/20-cloud/10-using-databend-cloud/01-warehouses.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/en/guides/90-community/02-rfcs/20250109-disaster-recovery.md‎
Lines changed: 254 additions & 0 deletions b/‎docs/en/guides/90-community/02-rfcs/20250109-disaster-recovery.md‎
Lines changed: 254 additions & 0 deletions
diff --git a/‎docusaurus.config.ts‎
Lines changed: 1 addition & 0 deletions b/‎docusaurus.config.ts‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎lychee.toml‎
Lines changed: 2 additions & 1 deletion b/‎lychee.toml‎
Lines changed: 2 additions & 1 deletion
@@ -6,7 +6,7 @@ import PlaySVG from '@site/static/img/icon/play.svg'
 import SuspendSVG from '@site/static/img/icon/suspend.svg'
 import CheckboxSVG from '@site/static/img/icon/checkbox.svg'
 import EllipsisSVG from '@site/static/img/icon/ellipsis.svg'
-import  { Button } from 'antd'
+import { Button } from 'antd'
 
 计算集群是 Databend Cloud 的核心组件。一个计算集群代表一组包括 CPU、内存和本地缓存的计算资源。您必须运行一个计算集群来执行 SQL 任务，例如：
 
@@ -20,14 +20,14 @@ import  { Button } from 'antd'
 
 在 Databend Cloud 中，计算集群有多种大小，每种大小由其可以处理的最大并发查询数量定义。创建计算集群时，您可以从以下大小中选择：
 
-| 大小                  | 最大并发数 | 推荐使用场景                                                                                                                            |
-|-----------------------|------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|
-| XSmall                | 2                | 最适合用于测试或运行轻量级查询等简单任务。适用于小型数据集（约 50GB）。                                          |
-| Small                 | 4                | 非常适合运行常规报告和中等负载。适用于中等大小的数据集（约 200GB）。                                     |
-| Medium                | 8                | 适合处理更复杂查询和高并发的团队。适用于较大的数据集（约 1TB）。                                 |
-| Large                 | 16               | 非常适合运行许多并发查询的组织。适用于大型数据集（约 5TB）。                                             |
-| XLarge                | 32               | 为企业级高并发工作负载构建。适用于非常大的数据集（超过 10TB）。                                        |
-| 多集群扩展            | 最多无限         | 根据工作负载需求自动扩展和缩减，提供最经济高效的方式来根据需求提高并发性。 |
+| 大小       | 最大并发数 | 推荐使用场景                                                               |
+| ---------- | ---------- | -------------------------------------------------------------------------- |
+| XSmall     | 2          | 最适合用于测试或运行轻量级查询等简单任务。适用于小型数据集（约 50GB）。    |
+| Small      | 4          | 非常适合运行常规报告和中等负载。适用于中等大小的数据集（约 200GB）。       |
+| Medium     | 8          | 适合处理更复杂查询和高并发的团队。适用于较大的数据集（约 1TB）。           |
+| Large      | 16         | 非常适合运行许多并发查询的组织。适用于大型数据集（约 5TB）。               |
+| XLarge     | 32         | 为企业级高并发工作负载构建。适用于非常大的数据集（超过 10TB）。            |
+| 多集群扩展 | 最多无限   | 根据工作负载需求自动扩展和缩减，提供最经济高效的方式来根据需求提高并发性。 |
 
 为了选择合适的计算集群大小，Databend 建议从较小的尺寸开始。较小的计算集群可能比中等或大型计算集群执行 SQL 任务的时间更长。如果您发现查询执行时间过长（例如，几分钟），请考虑升级到中等或大型计算集群以获得更快的查询结果。
 
@@ -46,39 +46,46 @@ import  { Button } from 'antd'
 
 您可以对计算集群执行批量操作，包括批量重启、批量暂停、批量恢复和批量删除。为此，通过在计算集群列表中勾选复选框 <CheckboxSVG/> 选择要进行批量操作的计算集群，然后点击椭圆按钮 <EllipsisSVG/> 以执行所需的操作。
 
-![alt text](../../../../../static/img/cloud/bulk.png)
+![alt text](../../../../../static/img/cloud/bulk.gif)
 
 ### 最佳实践
 
 为了有效管理您的计算集群并确保最佳性能和成本效益，请考虑以下最佳实践。这些指南将帮助您为各种工作负载和环境调整计算集群的大小、组织和微调：
 
-- **选择合适的大小**  
-  - 对于**开发和测试**，使用较小的计算集群（XSmall、Small）。  
-  - 对于**生产环境**，选择较大的计算集群（Medium、Large、XLarge）。  
+- **选择合适的大小**
 
-- **分离计算集群**  
-  - 为**数据加载**和**查询执行**使用单独的计算集群。  
-  - 为**开发**、**测试**和**生产**环境创建不同的计算集群。  
+  - 对于**开发和测试**，使用较小的计算集群（XSmall、Small）。
+  - 对于**生产环境**，选择较大的计算集群（Medium、Large、XLarge）。
 
-- **数据加载技巧**  
-  - 较小的计算集群（Small、Medium）适合数据加载。  
-  - 优化文件大小和文件数量以提高性能。  
+- **分离计算集群**
 
-- **优化成本和性能**  
-  - 避免运行简单的查询，如 `SELECT 1`，以最小化积分使用。  
-  - 使用批量加载（`COPY`）而不是单个 `INSERT` 语句。  
-  - 监控长时间运行的查询并优化它们以提高性能。  
+  - 为**数据加载**和**查询执行**使用单独的计算集群。
+  - 为**开发**、**测试**和**生产**环境创建不同的计算集群。
 
-- **自动暂停**  
-  - 启用自动暂停以在计算集群空闲时节省积分。  
+- **数据加载技巧**
 
-- **频繁查询时禁用自动暂停**  
-  - 保持计算集群活跃以进行频繁或重复的查询，以保持缓存并避免延迟。  
+  - 较小的计算集群（Small、Medium）适合数据加载。
+  - 优化文件大小和文件数量以提高性能。
 
-- **使用自动扩展（仅限商业和专属计划）**  
-  - 多集群扩展根据工作负载需求自动调整资源。  
+- **优化成本和性能**
 
-- **监控和调整使用情况**  
+  - 避免运行简单的查询，如 `SELECT 1`，以最小化积分使用。
+  - 使用批量加载（`COPY`）而不是单个 `INSERT` 语句。
+  - 监控长时间运行的查询并优化它们以提高性能。
+
+- **自动暂停**
+
+  - 启用自动暂停以在计算集群空闲时节省积分。
+
+- **频繁查询时禁用自动暂停**
+
+  - 保持计算集群活跃以进行频繁或重复的查询，以保持缓存并避免延迟。
+
+- **使用自动扩展（仅限商业和专属计划）**
+
+  - 多集群扩展根据工作负载需求自动调整资源。
+
+- **监控和调整使用情况**
   - 定期审查计算集群使用情况并根据需要调整大小以平衡成本和性能。
 
 ## 计算集群访问控制
@@ -137,4 +144,4 @@ Databend Cloud 允许您通过为计算集群分配特定角色来管理计算
 
 1. 点击 **概览** 页面上的 **连接**。
 2. 选择您希望连接的数据库和计算集群。连接信息将根据您的选择更新。
-3. 连接详细信息包括一个名为 `cloudapp` 的 SQL 用户，其密码是随机生成的。Databend Cloud 不存储此密码。请务必复制并安全保存。如果您忘记了密码，请点击 **重置** 以生成一个新密码。
+3. 连接详细信息包括一个名为 `cloudapp` 的 SQL 用户，其密码是随机生成的。Databend Cloud 不存储此密码。请务必复制并安全保存。如果您忘记了密码，请点击 **重置** 以生成一个新密码。
@@ -46,7 +46,7 @@ A suspended warehouse does not consume any credits. You can manually suspend or
 
 You can perform bulk operations on warehouses, including bulk restart, bulk suspend, bulk resume, and bulk delete. To do so, select the warehouses for bulk operations by checking the checkboxes <CheckboxSVG/> in the warehouse list, and then click the ellipse button <EllipsisSVG/> for the desired operation.
 
-![alt text](../../../../../static/img/cloud/bulk.png)
+![alt text](../../../../../static/img/cloud/bulk.gif)
 
 ### Best Practices
 
 
@@ -0,0 +1,254 @@
+---
+title: Disaster Recovery
+description: Enable Databend to recover from disasters involving the loss of either metadata or data.
+---
+
+- RFC PR: [databendlabs/databend-docs#1546](https://github.com/databendlabs/databend-docs/pull/1546)
+- Tracking Issue: [datafuselabs/databend#17234](https://github.com/databendlabs/databend/issues/17234)
+
+## Summary
+
+Enable databend to recover from disasters involving the loss of either metadata or data.
+
+## Motivation
+
+Databend is designed to be highly available and fault-tolerant. Its metadata is served by Databend MetaSrv, which is powered by [OpenRaft](https://github.com/databendlabs/openraft). The data is stored in object storage systems such as S3, GCS, and others, which guarantee 99.99% availability and 99.999999999% durability.
+
+However, it is insufficient for our enterprise users who require a robust disaster recovery plan. These users either have significant needs for cross-continent disaster recovery or must comply with stringent regulatory requirements.
+
+For example, [The Health Insurance Portability and Accountability Act (HIPAA)](https://www.hhs.gov/hipaa/index.html) mandates that healthcare organizations develop and implement contingency plans. Such planning ensures that, in the event of a natural or man-made disaster disrupting operations, the business can continue functioning until regular services are restored.
+
+This RFC proposes a solution to enable Databend to recover from disasters involving the loss of metadata or data.
+
+## Guide-Level Explanation
+
+This RFC introduces the first step toward enabling Databend to recover from disasters, such as metadata or data loss, by providing a robust backup and restore solution. Our proposed product, tentatively named `bendsave`, will allow users to back up and restore both metadata and data efficiently.
+
+*The name of this product is not decided yet, let's call it `bendsave`*
+
+### 1. Backup
+
+Create backups of cluster data and metadata using the `bendsave backup` command. Incremental backups are supported, ensuring that only changes since the last backup are saved. This simplifies daily backups.
+
+Example:
+
+```shell
+bendsave backup --from /path/to/query-node-1.toml --to s3://backup/
+```
+
+Key Points:
+- Metadata and data are stored in the backup location.
+- Enables complete cluster recovery, even in cases of total failure.
+
+### 2. List Backups
+
+To view all backups stored in a specified location, use the `bendsave list` command.
+
+Example:
+
+```shell
+bendsave list s3://backup/
+```
+
+### 3. Restore
+
+Restore a Databend cluster from a backup using the `bendsave restore` command. By default, this operates in dry-run mode to prevent accidental restoration. For automatic restoration, use the `--confirm` flag.
+
+Example:
+
+```shell
+# Dry-run mode (default)
+bendsave restore --from s3://backup/path/to/backup/manifest --to /path/to/query-node-1.toml
+
+# Perform the restoration immediately
+bendsave restore --from s3://backup/path/to/backup/manifest --to /path/to/query-node-1.toml --confirm
+```
+
+### 4. Vacuum
+
+Manage backup retention using the `bendsave vacuum` command. This ensures backups adhere to your retention policies by removing old or unnecessary backups.
+
+Example:
+
+```shell
+bendsave vacuum s3://backup \
+    --retention-days 30     \
+    --min-retention-days 7  \
+    --max-backups 5         \
+    --min-backups 2
+```
+
+The `bendsave` tool will provide a simple yet powerful way to secure Databend clusters through backup and restore operations. With features like incremental backups, dry-run restore mode, and vacuum-based retention management, it offers users control and reliability in disaster recovery scenarios.
+
+## Reference-level explanation
+
+`bendsave` will introduce an `BackupManifest` in which stores the following things:
+
+- metadata of given backup: like backup time, backup location, backup type (full or incremental), etc.
+- the locations of metadata backup: the locations which points to the metadata backup.
+- the locations of data backup: the locations which contains all table data.
+
+```rust
+struct BackupManifest {
+    backup_meta: BackupMeta,
+    
+    metasrv: BackupFile,
+    storage: Vec<BackupFile>,
+    ...
+}
+
+struct BackupMeta {
+    backup_time: DateTime<Utc>,
+    ...
+}
+
+struct BackupFile {
+    blocks: Vec<Block>,
+    etag: String,
+}
+
+struct BackupBlock {
+    block_id: String,
+    block_size: u64,
+    ...
+}
+```
+
+The `BackupManifest` will be encoded by protobuf and stored inside backup storage along with the backup metadata and data.
+
+The protobuf definition of `BackupManifest` will be versioned to ensure both backward and forward compatibility. This will enable Databend Query to restore backups created using different versions of Databend.
+
+### Backup Storage Layout
+
+The backup storage layout will be as follows:
+
+```
+s3://backup/bendsave.md
+s3://backup/manifests/20250114_201500.manifest
+s3://backup/manifests/20250115_201500.manifest
+s3://backup/manifests/20250116_201500.manifest
+s3://backup/data/<block_id_0>
+s3://backup/data/<block_id_1>
+s3://backup/data/<block_id_....>
+s3://backup/data/<block_id_N>
+```
+
+- `bendsave.md` serves as a quick reference guide to help users understand backup storage and recover the cluster.
+- Each manifest in the `manifests/` directory includes everything needed to restore the cluster.
+- The `data/` directory stores all the data blocks. Bendsave splits the source data into fixed-size blocks (e.g., 8 MiB) and uses their SHA-256 checksum as the block ID.
+
+### Backup Process
+
+- Export all metasrv data and save it to the backup storage.
+- Enumerate the source backend storage services to create a `BackupManifest` file.
+- Copy all data files to the backup storage.
+
+For incremental backups, Databend examines the existing `BackupManifest` file and transfers only the modified data files to the backup storage, along with a new `BackupManifest` file.
+
+For example:
+
+The first time users perform a backup like:
+
+```shell
+bendsave backup --from /path/to/query-node-1.toml --to s3://backup/
+```
+
+they will see the following files created:
+
+```shell
+s3://backup/bendsave.md
+s3://backup/manifests/20250114_201500.manifest
+s3://backup/data/<sha256_of_block_0>
+s3://backup/data/<sha256_of_block_1>
+s3://backup/data/<sha256_of_block_....>
+s3://backup/data/<sha256_of_block_N>
+```
+
+The second time users perform a backup, bendsave will generate the following files and omit existing blocks:
+
+```shell
+s3://backup/bendsave.md
+s3://backup/manifests/20250114_201500.manifest
+s3://backup/manifests/20250115_201500.manifest
+s3://backup/data/<sha256_of_block_0>
+s3://backup/data/<sha256_of_block_1>
+s3://backup/data/<sha256_of_block_....>
+s3://backup/data/<sha256_of_block_N>
+s3://backup/data/<sha256_of_block_....>
+s3://backup/data/<sha256_of_block_M>
+```
+
+The block id is generated by the SHA-256 checksum of the block content. So we can reuse the same block if it has been backed up before.
+
+### Restore Process
+
+- Read the `BackupManifest` file from the backup storage.
+- Copies all related data files to their original location.
+- Read the backed up metasrv data and import into new metasrv cluster.
+
+Please note that the restore process will overwrite the entire MetaSrv cluster. All existing metadata in the backup target MetaSrv cluster will be permanently lost.
+
+Users can restore from a backup using the following command:
+
+```shell
+bendsave restore --from s3://backup/manifests/20250114_201500.manifest --to /path/to/query-node-1.toml
+```
+
+Users can also restore incrementally by specifying the latest manifest file:
+
+```shell
+bendsave restore --from s3://backup/manifests/20250115_201500.manifest --to /path/to/query-node-1.toml
+```
+
+## Drawbacks
+
+None.
+
+## Rationale and alternatives
+
+None.
+
+## Prior art
+
+### Databricks Clone
+
+Databricks allows users to perform shadow and deep cloning of a table.
+
+For example:
+
+Use clone for data archiving
+
+```sql
+CREATE OR REPLACE TABLE archive_table CLONE my_prod_table;
+```
+
+Or use clone for short-term experiments on a production table
+
+```sql
+-- Perform shallow clone
+CREATE OR REPLACE TABLE my_test SHALLOW CLONE my_prod_table;
+
+UPDATE my_test WHERE user_id is null SET invalid=true;
+-- Run a bunch of validations. Once happy:
+
+-- This should leverage the update information in the clone to prune to only
+-- changed files in the clone if possible
+MERGE INTO my_prod_table
+USING my_test
+ON my_test.user_id <=> my_prod_table.user_id
+WHEN MATCHED AND my_test.user_id is null THEN UPDATE *;
+
+DROP TABLE my_test;
+```
+
+## Unresolved questions
+
+None.
+
+## Future possibilities
+
+### Replication
+
+In the future, we could extend the backup and restore functionality to support replication. This would allow users to replicate databases or tables across different databend clusters for disaster recovery or data distribution purposes.
+
+Databend can also implement a warm standby to ensure high availability and fault tolerance.
@@ -115,6 +115,7 @@ const config: Config = {
     "docusaurus-plugin-sass",
     "./src/plugins/global-sass-var-inject",
     "./src/plugins/fetch-databend-releases",
+    "./src/plugins/gurubase-widget",
     [
       "@docusaurus/plugin-content-docs",
       /** @type {import('@docusaurus/plugin-content-docs').Options} */
 
@@ -102,7 +102,8 @@ exclude = [
   'https://singup.snowflake.com/',
   'https://www.uber.com/',
   'https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Cascades-graefe.pdf',
-  '^https://repo\.databend\.com/databend/'
+  '^https://repo\.databend\.com/databend/',
+  'https://www.hhs.gov',
 ]
 
 # URLs to check (supports regex). Has preference over all excludes.
Original file line number	Diff line number	Diff line change
`@@ -115,6 +115,7 @@ const config: Config = {`
`115`	`115`	`"docusaurus-plugin-sass",`
`116`	`116`	`"./src/plugins/global-sass-var-inject",`
`117`	`117`	`"./src/plugins/fetch-databend-releases",`
	`118`	`+ "./src/plugins/gurubase-widget",`
`118`	`119`	`[`
`119`	`120`	`"@docusaurus/plugin-content-docs",`
`120`	`121`	`/** @type {import('@docusaurus/plugin-content-docs').Options} */`
Original file line number	Diff line number	Diff line change
`@@ -102,7 +102,8 @@ exclude = [`
`102`	`102`	`'https://singup.snowflake.com/',`
`103`	`103`	`'https://www.uber.com/',`
`104`	`104`	`'https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Cascades-graefe.pdf',`
`105`		`- '^https://repo\.databend\.com/databend/'`
	`105`	`+ '^https://repo\.databend\.com/databend/',`
	`106`	`+ 'https://www.hhs.gov',`
`106`	`107`	`]`
`107`	`108`
`108`	`109`	`# URLs to check (supports regex). Has preference over all excludes.`