Draft: support spark v2 write #5241

zhongyujiang · 2025-03-10T04:20:59Z

Purpose

Linked issue: part of #4816

Support spark datasource v2 write path, reduce write serialization overhead and accelerate the process of writing to primary key tables in Spark. Currently only added support for fixed-bucket table.

Tests

BucketFunctionTest, SparkWriteITCase

PaimonSourceWriteBenchmark：

Benchmark                           Mode  Cnt   Score    Error  Units
PaimonSourceWriteBenchmark.v1Write    ss    5  13.845 ± 23.192   s/op
PaimonSourceWriteBenchmark.v2Write    ss    5   9.579 ± 14.929   s/op

API and Format

Documentation

Add a config spark.sql.paimon.use-v2-write to enable switching to v2 write, will fall back to v1 write when encountering an unsupported scenario(e.g. HASH_DYNAMIC bucket mode table).

Note: this is an overall draft PR, which will be split into smaller PRs for easier review.

Zouxxyy · 2025-03-10T16:07:09Z

...spark/paimon-spark-common/src/main/java/org/apache/paimon/spark/SparkInternalRowWrapper.java

+import java.io.Serializable;
+
+/** Wrapper of Spark {@link InternalRow}. */
+public class SparkInternalRowWrapper implements org.apache.paimon.data.InternalRow, Serializable {


Oh, it looks like #5159 has some duplicated work.

Yea, I'll rebase once it's merged.

zhongyujiang · 2025-05-06T02:38:46Z

replaced by #5242 and #5531

zhongyujiang added 3 commits March 8, 2025 21:38

Add Spark bucket function.

426dea8

Add Spark V2 Write.

d43ec0c

Add Paimon Spark write benchmark and fix styles.

d0662f6

zhongyujiang mentioned this pull request Mar 10, 2025

[spark]: add paimon bucket functions #5242

Merged

Zouxxyy reviewed Mar 10, 2025

View reviewed changes

zhongyujiang closed this May 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft: support spark v2 write #5241

Draft: support spark v2 write #5241

Uh oh!

zhongyujiang commented Mar 10, 2025

Uh oh!

Zouxxyy Mar 10, 2025

Uh oh!

zhongyujiang Mar 11, 2025

Uh oh!

zhongyujiang commented May 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Draft: support spark v2 write #5241

Draft: support spark v2 write #5241

Uh oh!

Conversation

zhongyujiang commented Mar 10, 2025

Purpose

Tests

API and Format

Documentation

Uh oh!

Zouxxyy Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

zhongyujiang Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

zhongyujiang commented May 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants