Skip to content

Commit 58422f4

Browse files
authored
Update 04-data-recycle.md
1 parent 01a106e commit 58422f4

File tree

1 file changed

+62
-51
lines changed

1 file changed

+62
-51
lines changed

docs/en/guides/57-data-management/04-data-recycle.md

Lines changed: 62 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -3,89 +3,100 @@ title: Data Purge and Recycle
33
sidebar_label: Data Recycle
44
---
55

6-
In Databend, the data is not truly deleted when you run `DROP`, `TRUNCATE`, or `DELETE` commands, allowing for time travel back to previous states.
6+
## Overview
77

8-
There are two types of data:
8+
In Databend, data is not immediately deleted when you run `DROP`, `TRUNCATE`, or `DELETE` commands. This enables Databend's time travel feature, allowing you to access previous states of your data. However, this approach means that storage space is not automatically freed up after these operations.
99

10-
- **History Data**: Used by Time Travel to store historical data or data from dropped tables.
11-
- **Temporary Data**: Used by the system to store spilled data.
10+
## Types of Data to Clean
1211

13-
If the data size is significant, you can run several commands ([Enterprise Edition Features](/guides/products/dee/enterprise-features)) to delete these data and free up storage space.
12+
In Databend, there are four main types of data that may need cleaning:
1413

15-
## Spill Data Storage
14+
1. **Dropped Table Data**: Data files from tables that have been dropped using the DROP TABLE command
15+
2. **Table History Data**: Historical versions of tables, including snapshots created through UPDATE, DELETE, and other operations
16+
3. **Orphan Files**: Snapshots, segments, and blocks that are no longer associated with any table
17+
4. **Spill Temporary Files**: Temporary files created when memory usage exceeds available limits during query execution (for joins, aggregates, sorts, etc.). Databend automatically cleans up these files when queries complete normally. Manual cleanup is only needed in rare cases when Databend crashes or shuts down unexpectedly during query execution.
1618

17-
Self-hosted Databend supports spilling intermediate query results to disk when memory usage exceeds available limits. Users can configure where spill data is stored, choosing between local disk storage and a remote S3-compatible bucket.
19+
## Using VACUUM Commands
1820

19-
### Spill Storage Options
21+
The VACUUM command family is the primary method for cleaning data in Databend ([Enterprise Edition Feature](/guides/products/dee/enterprise-features)). Different VACUUM subcommands are used depending on the type of data you need to clean.
2022

21-
Databend provides the following spill storage configurations:
23+
### VACUUM DROP TABLE
2224

23-
- Local Disk Storage: Spilled data is written to a specified local directory in the query node. Please note that local disk storage is supported only for [Windows Functions](/sql/sql-functions/window-functions/).
24-
- Remote S3-Compatible Storage: Spilled data is stored in an external bucket.
25-
- Default Storage: If no spill storage is configured, Databend spills data to the default storage bucket along with your table data.
25+
This command permanently deletes data files of dropped tables, freeing up storage space.
2626

27-
### Spill Priority
27+
```sql
28+
VACUUM DROP TABLE [FROM <database_name>] [DRY RUN [SUMMARY]] [LIMIT <file_count>];
29+
```
2830

29-
If both local and S3-compatible spill storage are configured, Databend follows this order:
31+
**Options:**
32+
- `FROM <database_name>`: Restrict to a specific database
33+
- `DRY RUN [SUMMARY]`: Preview files to be removed without actually deleting them
34+
- `LIMIT <file_count>`: Limit the number of files to be vacuumed
3035

31-
1. Spill to local disk first (if configured).
32-
2. Spill to remote S3-compatible storage when local disk space is insufficient.
33-
3. Spill to Databend’s default storage bucket if neither local nor external S3-compatible storage is configured.
36+
**Examples:**
3437

35-
### Configuring Spill Storage
38+
```sql
39+
-- Preview files that would be removed
40+
VACUUM DROP TABLE DRY RUN;
3641

37-
To configure spill storage, update the [databend-query.toml](https://github.com/databendlabs/databend/blob/main/scripts/distribution/configs/databend-query.toml) configuration file.
42+
-- Preview summary of files that would be removed
43+
VACUUM DROP TABLE DRY RUN SUMMARY;
3844

39-
This example sets Databend to use up to 1 TB of local disk space for spill operations, while reserving 40% of the disk for system use:
45+
-- Remove dropped tables from the "default" database
46+
VACUUM DROP TABLE FROM default;
4047

41-
```toml
42-
[spill]
43-
spill_local_disk_path = "/data1/databend/databend_spill"
44-
spill_local_disk_reserved_space_percentage = 40
45-
spill_local_disk_max_bytes = 1099511627776
48+
-- Remove up to 1000 files from dropped tables
49+
VACUUM DROP TABLE LIMIT 1000;
4650
```
4751

48-
This example sets Databend to use MinIO as an S3-compatible storage service for spill operations:
49-
50-
```toml
51-
[spill]
52-
[spill.storage]
53-
type = "s3"
54-
[spill.storage.s3]
55-
bucket = "databend"
56-
root = "admin"
57-
endpoint_url = "http://127.0.0.1:9900"
58-
access_key_id = "minioadmin"
59-
secret_access_key = "minioadmin"
60-
allow_insecure = true
52+
### VACUUM TABLE
53+
54+
This command removes historical data for a specified table, clearing old versions and freeing storage.
55+
56+
```sql
57+
VACUUM TABLE <table_name> [DRY RUN [SUMMARY]];
6158
```
6259

63-
## Purge Drop Table Data
60+
**Options:**
61+
- `DRY RUN [SUMMARY]`: Preview files to be removed without actually deleting them
6462

65-
Deletes data files of all dropped tables, freeing up storage space.
63+
**Examples:**
6664

6765
```sql
68-
VACUUM DROP TABLE;
69-
```
66+
-- Preview files that would be removed
67+
VACUUM TABLE my_table DRY RUN;
68+
69+
-- Preview summary of files that would be removed
70+
VACUUM TABLE my_table DRY RUN SUMMARY;
7071

71-
See more [VACUUM DROP TABLE](/sql/sql-commands/administration-cmds/vacuum-drop-table).
72+
-- Remove historical data from my_table
73+
VACUUM TABLE my_table;
74+
```
7275

73-
## Purge Table History Data
76+
### VACUUM TEMPORARY FILES
7477

75-
Removes historical data for a specified table, clearing old versions and freeing storage.
78+
This command clears temporary spilled files used for joins, aggregates, and sorts, freeing up storage space.
7679

7780
```sql
78-
VACUUM TABLE <table_name>;
81+
VACUUM TEMPORARY FILES;
7982
```
8083

81-
See more [VACUUM TABLE](/sql/sql-commands/administration-cmds/vacuum-table).
84+
**Note:** While this command is provided as a manual method for cleaning up temporary files, it's rarely needed during normal operation since Databend automatically handles cleanup in most cases.
8285

83-
## Purge Temporary Data
86+
## Adjusting Data Retention Time
8487

85-
Clears temporary spilled files used for joins, aggregates, and sorts, freeing up storage space.
88+
The VACUUM commands remove data files older than the `DATA_RETENTION_TIME_IN_DAYS` setting. By default, Databend retains historical data for 1 day (24 hours). You can adjust this setting:
8689

8790
```sql
88-
VACUUM TEMPORARY FILES;
91+
-- Change retention period to 2 days
92+
SET GLOBAL DATA_RETENTION_TIME_IN_DAYS = 2;
93+
94+
-- Check current retention setting
95+
SHOW SETTINGS LIKE 'DATA_RETENTION_TIME_IN_DAYS';
8996
```
9097

91-
See more [VACUUM TEMPORARY FILES](/sql/sql-commands/administration-cmds/vacuum-temp-files).
98+
| Edition | Default Retention | Maximum Retention |
99+
| ---------------------------------------- | ----------------- | ---------------- |
100+
| Databend Community & Enterprise Editions | 1 day (24 hours) | 90 days |
101+
| Databend Cloud (Personal) | 1 day (24 hours) | 1 day (24 hours) |
102+
| Databend Cloud (Business) | 1 day (24 hours) | 90 days |

0 commit comments

Comments
 (0)