Skip to content

Commit da51f12

Browse files
committed
Merge branch 'altertable-ai-su/compaction-size-filtering'
2 parents 00ee9e1 + 165c973 commit da51f12

File tree

1 file changed

+43
-2
lines changed

1 file changed

+43
-2
lines changed

docs/preview/duckdb/maintenance/merge_adjacent_files.md

Lines changed: 43 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,51 @@ Or if you want to target a specific table within a schema:
2727
CALL ducklake_merge_adjacent_files('my_ducklake', 't', schema => 'some_schema');
2828
```
2929

30-
Compacting data files can be a very memory intensive operation. You may consider performing this operation in batches by specifying the `max_compacted_files` parameter.
30+
## Advanced Options
31+
32+
The `merge_adjacent_files` function supports optional parameters to filter which files are considered for compaction and control memory usage. This enables advanced compaction strategies and more granular control over the compaction process.
33+
34+
- **`max_compacted_files`**: Limits the maximum number of files to compact in a single operation. Compacting data files can be a very memory intensive operation, so you may consider performing this operation in batches by specifying this parameter.
35+
- **`min_file_size`**: Files smaller than this size (in bytes) are excluded from compaction. If not specified, all files are considered regardless of minimum size.
36+
- **`max_file_size`**: Files at or larger than this size (in bytes) are excluded from compaction. If not specified, it defaults to `target_file_size`. Must be greater than 0.
37+
38+
Example with compacted files limit:
3139

3240
```sql
33-
CALL ducklake_merge_adjacent_files('my_ducklake', 't', schema => 'some_schema', max_compacted_files => 1000);
41+
CALL ducklake_merge_adjacent_files('my_ducklake', max_compacted_files => 100);
42+
```
43+
44+
Example with size filtering:
45+
46+
```sql
47+
-- Only merge files between 10KB and 100KB
48+
CALL ducklake_merge_adjacent_files('my_ducklake', min_file_size => 10240, max_file_size => 102400);
49+
```
50+
51+
### Example: Tiered Compaction Strategy for Streaming Workloads
52+
53+
File size filtering enables tiered compaction strategies, which are particularly useful for realtime/streamed ingestion patterns. A tiered approach merges files in stages:
54+
55+
- **Tier 0 → Tier 1**: Done often, merge small files (< 1MB) into ~5MB files
56+
- **Tier 1 → Tier 2**: Done occasionally, merge medium files (1MB-10MB) into ~32MB files
57+
- **Tier 2 → Tier 3**: Done rarely, merge large files (10MB-64MB) into ~128MB files
58+
59+
This compaction strategy provides more predictable I/O amplification and better incremental compaction for streaming workloads.
60+
61+
Example tiered compaction workflow:
62+
63+
```sql
64+
-- Tier 0 → Tier 1: merge small files
65+
CALL ducklake_set_option('my_ducklake', 'target_file_size', '5MB');
66+
CALL ducklake_merge_adjacent_files('my_ducklake', max_file_size => 1048576);
67+
68+
-- Tier 1 → Tier 2: merge medium files
69+
CALL ducklake_set_option('my_ducklake', 'target_file_size', '32MB');
70+
CALL ducklake_merge_adjacent_files('my_ducklake', min_file_size => 1048576, max_file_size => 10485760);
71+
72+
-- Tier 2 → Tier 3: merge large files
73+
CALL ducklake_set_option('my_ducklake', 'target_file_size', '128MB');
74+
CALL ducklake_merge_adjacent_files('my_ducklake', min_file_size => 10485760, max_file_size => 67108864);
3475
```
3576

3677
> Calling this function does not immediately delete the old files.

0 commit comments

Comments
 (0)