Skip to content

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Apr 2, 2025

What changes were proposed in this pull request?

This PR aims to support DataFrameWriter.

Why are the changes needed?

For feature parity.

Does this PR introduce any user-facing change?

No because this is a new addition to the unreleased version.

How was this patch tested?

Pass the CIs with the newly added test suite. Or, manual test.

$ swift test --filter DataFrameWriterTests
Building for debugging...
[13/13] Compiling SparkConnectTests DataFrameWriterTests.swift
Build complete! (3.63s)
Test Suite 'Selected tests' started at 2025-04-02 17:01:14.353.
Test Suite 'SparkConnectPackageTests.xctest' started at 2025-04-02 17:01:14.354.
Test Suite 'SparkConnectPackageTests.xctest' passed at 2025-04-02 17:01:14.354.
	 Executed 0 tests, with 0 failures (0 unexpected) in 0.000 (0.000) seconds
Test Suite 'Selected tests' passed at 2025-04-02 17:01:14.354.
	 Executed 0 tests, with 0 failures (0 unexpected) in 0.000 (0.002) seconds
􀟈  Test run started.
􀄵  Testing Library Version: 102 (arm64e-apple-macos13.0)
􀟈  Suite DataFrameWriterTests started.
􀟈  Test orc() started.
􀟈  Test pathAlreadyExist() started.
􀟈  Test csv() started.
􀟈  Test overwrite() started.
􀟈  Test sortByBucketBy() started.
􀟈  Test json() started.
􀟈  Test save() started.
􀟈  Test partitionBy() started.
􀟈  Test parquet() started.
􁁛  Test sortByBucketBy() passed after 0.072 seconds.
􁁛  Test pathAlreadyExist() passed after 0.396 seconds.
􁁛  Test overwrite() passed after 0.504 seconds.
􁁛  Test orc() passed after 0.515 seconds.
􁁛  Test json() passed after 0.524 seconds.
􁁛  Test parquet() passed after 0.539 seconds.
􁁛  Test partitionBy() passed after 0.549 seconds.
􁁛  Test csv() passed after 0.569 seconds.
􁁛  Test save() passed after 1.001 seconds.
􁁛  Suite DataFrameWriterTests passed after 1.002 seconds.
􁁛  Test run with 9 tests passed after 1.002 seconds.

Was this patch authored or co-authored using generative AI tooling?

No.

try await #require(throws: Error.self) {
try await df.write.sortBy("col2").csv(tmpDir)
}
try await #require(throws: Error.self) {
Copy link

@peter-toth peter-toth Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: technically, Spark produces 3 different errors

  • when only sortBy and
  • when only bucketedBy and
  • when both

are specified, so adding a bucketedBy only test case might make sense.

@dongjoon-hyun
Copy link
Member Author

Thank you, @peter-toth .

@dongjoon-hyun
Copy link
Member Author

Merged to main.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-51689 branch April 2, 2025 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants