Skip to content

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jul 10, 2025

What changes were proposed in this pull request?

This PR aims to add Apache Spark 4.1.0-preview1 (RC1) test coverage.

Why are the changes needed?

To be ready for 4.1.0-preview1 release.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun dongjoon-hyun force-pushed the SPARK-52744 branch 2 times, most recently from 5d90e9a to 6e18f9e Compare July 10, 2025 03:34
@dongjoon-hyun dongjoon-hyun marked this pull request as draft July 10, 2025 03:44
@dongjoon-hyun dongjoon-hyun force-pushed the SPARK-52744 branch 2 times, most recently from b3651c2 to e8c02dc Compare July 10, 2025 05:38
@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review July 10, 2025 05:39
@viirya
Copy link
Member

viirya commented Jul 10, 2025

Is it a real error?

SparkConnect/DataFrame.swift:434: Assertion failed

@dongjoon-hyun
Copy link
Member Author

No, @viirya. it seems to be a JVM time database different.
Since the value is the same for UTC and GMT, I generalize the assertion.

case .timeInfo(.timestamp):
let timestampType = column.data.type as! ArrowTypeTimestamp
assert(timestampType.timezone == "Etc/UTC")
assert(timestampType.timezone == "Etc/UTC" || timestampType.timezone == "GMT")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the updated part, @viirya .

let spark = try await SparkSession.builder.getOrCreate()
let version = await spark.version
#expect(version.starts(with: "4.0.0") || version.starts(with: "3.5."))
#expect(version.starts(with: "4.") || version.starts(with: "3.5."))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is updated. Previously, I didn't expect 4.1.x release in 2025.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I saw this.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jul 10, 2025

Ur, sorry. It seems that the GitHub Action environment seems to have different from local. Let me dig a little more.

Test timestamp() started.
SparkConnect/DataFrame.swift:434: Assertion failed

Locally,

$ swift test --filter DataFrameTests | grep timestamp
Building for debugging...
[4/4] Write swift-version--2C13E65FA118F995.txt
Build complete! (0.22s)
◇ Test timestamp() started.
✔ Test timestamp() passed after 0.022 seconds.

@Test
func timestamp() async throws {
let spark = try await SparkSession.builder.getOrCreate()
// TODO(SPARK-52747)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timestamp support is not released yet. I created SPARK-52747 to investigate timezone issue completely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Thanks @dongjoon-hyun

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya . I'll merge this for now and do the follow-up for the TODO JIRAs.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-52744 branch July 10, 2025 06:34
dongjoon-hyun added a commit that referenced this pull request Jul 10, 2025
### What changes were proposed in this pull request?

This PR aims to support `defineDataset ` API in order to support `Declarative Pipelines` (SPARK-51727) of Apache Spark `4.1.0-preview1`.

### Why are the changes needed?

To support the new feature incrementally.

### Does this PR introduce _any_ user-facing change?

No, this is a new feature.

### How was this patch tested?

Pass the CIs with `4.1.0-preview1` test pipeline because we added it.
- #210

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #211 from dongjoon-hyun/SPARK-52748.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun added a commit that referenced this pull request Jul 10, 2025
### What changes were proposed in this pull request?

This PR aims to support `defineFlow ` API in order to support `Declarative Pipelines` (SPARK-51727) of Apache Spark `4.1.0-preview1`.

### Why are the changes needed?

To support the new feature incrementally.

### Does this PR introduce _any_ user-facing change?

No, this is a new feature.

### How was this patch tested?

Pass the CIs with `4.1.0-preview1` test pipeline.
- #210

<img width="1000" height="373" alt="Screenshot 2025-07-10 at 07 25 37" src="https://github.com/user-attachments/assets/b4e214f6-de6c-4c31-8482-58e8de1dd4ff" />

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #212 from dongjoon-hyun/SPARK-52756.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun added a commit that referenced this pull request Aug 25, 2025
…sting

### What changes were proposed in this pull request?

This PR aims to use `release` build in Apache Spark `4.1.0-preview1` testing.

### Why are the changes needed?

Although we used `release` build in CI since SPARK-52085, we missed to enable it when we add a new test job in SPARK-52744.
- #136
- #210

To be consistent, we need to use release build in CIs.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #220 from dongjoon-hyun/SPARK-53374.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants