-
Notifications
You must be signed in to change notification settings - Fork 539
Commit f7688c4
authored
fix: use single writer for all partition streams (#3870)
# Description
Collecting the streams concurrently and sending them over a sync channel
provides the benefit of us keeping a single writer instance where we
write through. This prevents the edge case of potentially writing many
small files, when datafusion decided to lots of partitioned streams that
are relatively small. Since we now maintain one writer we just keep
writing from all partition streams into that.
I guess this stacks well with your PR @abhiaagarwal? Wdyt?
- closes #3871
We now only create one file with the same code of the above issue
instead of 3 small files:
```json
{"protocol":{"minReaderVersion":1,"minWriterVersion":2}}
{"metaData":{"id":"0d1846c0-71e6-4c6d-aba1-5b9f8d752937","name":null,"description":null,"format":{"provider":"parquet","options":{}},"schemaString":"{\"type\":\"struct\",\"fields\":[{\"name\":\"foo\",\"type\":\"long\",\"nullable\":true,\"metadata\":{}}]}","partitionColumns":[],"createdTime":1760889081860,"configuration":{}}}
{"add":{"path":"part-00000-352305f0-eff1-44d3-993e-6bd31cdd118c-c000.snappy.parquet","partitionValues":{},"size":510,"modificationTime":1760889081877,"dataChange":true,"stats":"{\"numRecords\":9,\"minValues\":{\"foo\":1},\"maxValues\":{\"foo\":3},\"nullCount\":{\"foo\":0}}","tags":null,"baseRowId":null,"defaultRowCommitVersion":null,"clusteringProvider":null}}
{"commitInfo":{"timestamp":1760889081878,"operation":"WRITE","operationParameters":{"mode":"ErrorIfExists"},"engineInfo":"delta-rs:py-1.2.0","clientVersion":"delta-rs.py-1.2.0","operationMetrics":{"execution_time_ms":19,"num_added_files":1,"num_added_rows":9,"num_partitions":0,"num_removed_files":0}}}
```
---------
Signed-off-by: Ion Koutsouris <[email protected]>
Signed-off-by: Ion Koutsouris1 parent d0ae849 commit f7688c4Copy full SHA for f7688c4
File tree
Expand file treeCollapse file tree
2 files changed
+241
-168
lines changedOpen diff view settings
Filter options
- crates/core/src/operations/write
- python
Expand file treeCollapse file tree
2 files changed
+241
-168
lines changedOpen diff view settings
0 commit comments