You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Compacting data files can be a very memory intensive operation. You may consider performing this operation in batches by specifying the `max_compacted_files` parameter.
30
+
## Advanced Options
31
+
32
+
The `merge_adjacent_files` function supports optional parameters to filter which files are considered for compaction and control memory usage. This enables advanced compaction strategies and more granular control over the compaction process.
33
+
34
+
-**`max_compacted_files`**: Limits the maximum number of files to compact in a single operation. Compacting data files can be a very memory intensive operation, so you may consider performing this operation in batches by specifying this parameter.
35
+
-**`min_file_size`**: Files smaller than this size (in bytes) are excluded from compaction. If not specified, all files are considered regardless of minimum size.
36
+
-**`max_file_size`**: Files at or larger than this size (in bytes) are excluded from compaction. If not specified, it defaults to `target_file_size`. Must be greater than 0.
### Example: Tiered Compaction Strategy for Streaming Workloads
52
+
53
+
File size filtering enables tiered compaction strategies, which are particularly useful for realtime/streamed ingestion patterns. A tiered approach merges files in stages:
54
+
55
+
-**Tier 0 → Tier 1**: Done often, merge small files (< 1MB) into ~5MB files
56
+
-**Tier 1 → Tier 2**: Done occasionally, merge medium files (1MB-10MB) into ~32MB files
57
+
-**Tier 2 → Tier 3**: Done rarely, merge large files (10MB-64MB) into ~128MB files
58
+
59
+
This compaction strategy provides more predictable I/O amplification and better incremental compaction for streaming workloads.
0 commit comments