You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/en/guides/57-data-management/04-data-recycle.md
+62-51Lines changed: 62 additions & 51 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,89 +3,100 @@ title: Data Purge and Recycle
3
3
sidebar_label: Data Recycle
4
4
---
5
5
6
-
In Databend, the data is not truly deleted when you run `DROP`, `TRUNCATE`, or `DELETE` commands, allowing for time travel back to previous states.
6
+
## Overview
7
7
8
-
There are two types of data:
8
+
In Databend, data is not immediately deleted when you run `DROP`, `TRUNCATE`, or `DELETE` commands. This enables Databend's time travel feature, allowing you to access previous states of your data. However, this approach means that storage space is not automatically freed up after these operations.
9
9
10
-
-**History Data**: Used by Time Travel to store historical data or data from dropped tables.
11
-
-**Temporary Data**: Used by the system to store spilled data.
10
+
## Types of Data to Clean
12
11
13
-
If the data size is significant, you can run several commands ([Enterprise Edition Features](/guides/products/dee/enterprise-features)) to delete these data and free up storage space.
12
+
In Databend, there are four main types of data that may need cleaning:
14
13
15
-
## Spill Data Storage
14
+
1.**Dropped Table Data**: Data files from tables that have been dropped using the DROP TABLE command
15
+
2.**Table History Data**: Historical versions of tables, including snapshots created through UPDATE, DELETE, and other operations
16
+
3.**Orphan Files**: Snapshots, segments, and blocks that are no longer associated with any table
17
+
4.**Spill Temporary Files**: Temporary files created when memory usage exceeds available limits during query execution (for joins, aggregates, sorts, etc.). Databend automatically cleans up these files when queries complete normally. Manual cleanup is only needed in rare cases when Databend crashes or shuts down unexpectedly during query execution.
16
18
17
-
Self-hosted Databend supports spilling intermediate query results to disk when memory usage exceeds available limits. Users can configure where spill data is stored, choosing between local disk storage and a remote S3-compatible bucket.
19
+
## Using VACUUM Commands
18
20
19
-
### Spill Storage Options
21
+
The VACUUM command family is the primary method for cleaning data in Databend ([Enterprise Edition Feature](/guides/products/dee/enterprise-features)). Different VACUUM subcommands are used depending on the type of data you need to clean.
20
22
21
-
Databend provides the following spill storage configurations:
23
+
### VACUUM DROP TABLE
22
24
23
-
- Local Disk Storage: Spilled data is written to a specified local directory in the query node. Please note that local disk storage is supported only for [Windows Functions](/sql/sql-functions/window-functions/).
24
-
- Remote S3-Compatible Storage: Spilled data is stored in an external bucket.
25
-
- Default Storage: If no spill storage is configured, Databend spills data to the default storage bucket along with your table data.
25
+
This command permanently deletes data files of dropped tables, freeing up storage space.
26
26
27
-
### Spill Priority
27
+
```sql
28
+
VACUUM DROP TABLE [FROM<database_name>] [DRY RUN [SUMMARY]] [LIMIT<file_count>];
29
+
```
28
30
29
-
If both local and S3-compatible spill storage are configured, Databend follows this order:
31
+
**Options:**
32
+
-`FROM <database_name>`: Restrict to a specific database
33
+
-`DRY RUN [SUMMARY]`: Preview files to be removed without actually deleting them
34
+
-`LIMIT <file_count>`: Limit the number of files to be vacuumed
30
35
31
-
1. Spill to local disk first (if configured).
32
-
2. Spill to remote S3-compatible storage when local disk space is insufficient.
33
-
3. Spill to Databend’s default storage bucket if neither local nor external S3-compatible storage is configured.
36
+
**Examples:**
34
37
35
-
### Configuring Spill Storage
38
+
```sql
39
+
-- Preview files that would be removed
40
+
VACUUM DROPTABLEDRY RUN;
36
41
37
-
To configure spill storage, update the [databend-query.toml](https://github.com/databendlabs/databend/blob/main/scripts/distribution/configs/databend-query.toml) configuration file.
42
+
-- Preview summary of files that would be removed
43
+
VACUUM DROPTABLEDRY RUN SUMMARY;
38
44
39
-
This example sets Databend to use up to 1 TB of local disk space for spill operations, while reserving 40% of the disk for system use:
45
+
-- Remove dropped tables from the "default" database
This example sets Databend to use MinIO as an S3-compatible storage service for spill operations:
49
-
50
-
```toml
51
-
[spill]
52
-
[spill.storage]
53
-
type = "s3"
54
-
[spill.storage.s3]
55
-
bucket = "databend"
56
-
root = "admin"
57
-
endpoint_url = "http://127.0.0.1:9900"
58
-
access_key_id = "minioadmin"
59
-
secret_access_key = "minioadmin"
60
-
allow_insecure = true
52
+
### VACUUM TABLE
53
+
54
+
This command removes historical data for a specified table, clearing old versions and freeing storage.
55
+
56
+
```sql
57
+
VACUUM TABLE <table_name> [DRY RUN [SUMMARY]];
61
58
```
62
59
63
-
## Purge Drop Table Data
60
+
**Options:**
61
+
-`DRY RUN [SUMMARY]`: Preview files to be removed without actually deleting them
64
62
65
-
Deletes data files of all dropped tables, freeing up storage space.
63
+
**Examples:**
66
64
67
65
```sql
68
-
VACUUM DROP TABLE;
69
-
```
66
+
-- Preview files that would be removed
67
+
VACUUM TABLE my_table DRY RUN;
68
+
69
+
-- Preview summary of files that would be removed
70
+
VACUUM TABLE my_table DRY RUN SUMMARY;
70
71
71
-
See more [VACUUM DROP TABLE](/sql/sql-commands/administration-cmds/vacuum-drop-table).
72
+
-- Remove historical data from my_table
73
+
VACUUM TABLE my_table;
74
+
```
72
75
73
-
##Purge Table History Data
76
+
### VACUUM TEMPORARY FILES
74
77
75
-
Removes historical data for a specified table, clearing old versions and freeing storage.
78
+
This command clears temporary spilled files used for joins, aggregates, and sorts, freeing up storage space.
76
79
77
80
```sql
78
-
VACUUM TABLE <table_name>;
81
+
VACUUM TEMPORARY FILES;
79
82
```
80
83
81
-
See more [VACUUM TABLE](/sql/sql-commands/administration-cmds/vacuum-table).
84
+
**Note:** While this command is provided as a manual method for cleaning up temporary files, it's rarely needed during normal operation since Databend automatically handles cleanup in most cases.
82
85
83
-
## Purge Temporary Data
86
+
## Adjusting Data Retention Time
84
87
85
-
Clears temporary spilled files used for joins, aggregates, and sorts, freeing up storage space.
88
+
The VACUUM commands remove data files older than the `DATA_RETENTION_TIME_IN_DAYS` setting. By default, Databend retains historical data for 1 day (24 hours). You can adjust this setting:
86
89
87
90
```sql
88
-
VACUUM TEMPORARY FILES;
91
+
-- Change retention period to 2 days
92
+
SET GLOBAL DATA_RETENTION_TIME_IN_DAYS =2;
93
+
94
+
-- Check current retention setting
95
+
SHOW SETTINGS LIKE'DATA_RETENTION_TIME_IN_DAYS';
89
96
```
90
97
91
-
See more [VACUUM TEMPORARY FILES](/sql/sql-commands/administration-cmds/vacuum-temp-files).
98
+
| Edition | Default Retention | Maximum Retention |
0 commit comments