[SPARK-52069] Support `DataStreamReader` and `DataStreamWriter` #126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

dongjoon-hyun wants to merge 2 commits into apache:main from dongjoon-hyun:SPARK-52069

Member

dongjoon-hyun commented May 11, 2025 •

edited

Loading

What changes were proposed in this pull request?

This PR aims to support DataStreamReader and DataStreamWriter.

Why are the changes needed?

For feature parity.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs with the newly added test case.

Was this patch authored or co-authored using generative AI tooling?

No.

dongjoon-hyun force-pushed the SPARK-52069 branch from d68db21 to 8e623c7 Compare

May 11, 2025 15:25


          [SPARK-52069] Support DataStreamReader and DataStreamWriter

41d252c

dongjoon-hyun force-pushed the SPARK-52069 branch from 3069a80 to 41d252c Compare

May 11, 2025 16:11

doc

aead0f2

Member Author

dongjoon-hyun commented May 11, 2025

Could you review this too, @viirya ?

Member

viirya commented May 11, 2025

Yea

viirya approved these changes

View reviewed changes

Sources/SparkConnect/DataStreamReader.swift

+                /// Specifies the input data source format.
+                /// - Parameter source: A string.
+                /// - Returns: A ``DataStreamReader``.

Member

viirya May 11, 2025

Suggested change

      
              /// - Returns: A ``DataStreamReader``.
          
              /// - Returns: A `DataStreamReader`.

Member

viirya May 11, 2025

Just following other methods in this PR.

Member

viirya May 11, 2025

Or maybe other methods are incorrect but this is correct?

Member Author

dongjoon-hyun May 11, 2025

Both are correct~ A single backquote is for codifying. A double-backquote is for codifying and linking.

Member

viirya May 11, 2025

Yea, I guess so. Just checked if a consistent style could be better.

Sources/SparkConnect/DataStreamReader.swift

+                /// Define a Streaming DataFrame on a Table. The DataSource corresponding to the table should
+                /// support streaming mode.
+                /// - Parameter tableName: The name of the table.
+                /// - Returns: A ``DataFrame``.

Member

viirya May 11, 2025

Suggested change

      
              /// - Returns: A ``DataFrame``.
          
              /// - Returns: A `DataFrame`.

Sources/SparkConnect/DataStreamWriter.swift

+                /// started with `start()`. This name must be unique among all the currently active queries in
+                /// the associated SparkSession.
+                /// - Parameter queryName: A string name.
+                /// - Returns: A ``DataStreamWriter``.

Member

viirya May 11, 2025

Suggested change

      
              /// - Returns: A ``DataStreamWriter``.
          
              /// - Returns: A `DataStreamWriter`.

Sources/SparkConnect/DataStreamWriter.swift

+                /// aggregations, it will be equivalent to `append` mode.
+                ///
+                /// - Parameter outputMode: A string for outputMode.
+                /// - Returns: A ``DataStreamWriter``.

Member

viirya May 11, 2025

Suggested change

      
              /// - Returns: A ``DataStreamWriter``.
          
              /// - Returns: A `DataStreamWriter`.

Sources/SparkConnect/DataStreamWriter.swift

+                /// Partitions the output by the given columns on the file system. If specified, the output is
+                /// laid out on the file system similar to Hive's partitioning scheme.
+                /// - Parameter colNames: Column names to partition.
+                /// - Returns: A ``DataStreamWriter``.

Member

viirya May 11, 2025

Suggested change

      
              /// - Returns: A ``DataStreamWriter``.
          
              /// - Returns: A `DataStreamWriter`.

Sources/SparkConnect/DataStreamWriter.swift

+                /// given path as new data arrives. The returned ``StreamingQuery`` object can be used to interact
+                /// with the stream.
+                /// - Parameter path: A path to write.
+                /// - Returns: A ``StreamingQuery``.

Member

viirya May 11, 2025

Suggested change

      
              /// - Returns: A ``StreamingQuery``.
          
              /// - Returns: A `StreamingQuery`.

Sources/SparkConnect/DataStreamWriter.swift

+                /// given table as new data arrives. The returned ``StreamingQuery`` object can be used to interact
+                /// with the stream.
+                /// - Parameter tableName: A table name.
+                /// - Returns: A ``StreamingQuery``.

Member

viirya May 11, 2025

Suggested change

      
              /// - Returns: A ``StreamingQuery``.
          
              /// - Returns: A `StreamingQuery`.

Sources/SparkConnect/DataStreamWriter.swift

+                /// - The SQL configuration `spark.sql.streaming.stopActiveRunOnRestart` is enabled
+                /// - The active run cannot be stopped within the timeout controlled by the SQL configuration `spark.sql.streaming.stopTimeout`
+                ///
+                /// - Returns: A ``StreamingQuery``.

Member

viirya May 11, 2025

Suggested change

      
              /// - Returns: A ``StreamingQuery``.
          
              /// - Returns: A `StreamingQuery`.

Member Author

dongjoon-hyun commented May 11, 2025

Thank you for review. You are right for the inconsistency for single and double backticks. I'm trying to use double backticks more to add a link to the class.

Member Author

dongjoon-hyun commented May 11, 2025

Let me merge this~ For inconsistency, I'll try to do clean-up later for the rest of them.

dongjoon-hyun closed this in

c5f1ff7

dongjoon-hyun deleted the SPARK-52069 branch

May 11, 2025 18:43

Member

viirya commented May 11, 2025

Got it. Thank you @dongjoon-hyun

Member Author

dongjoon-hyun commented May 11, 2025

Thank you always~ 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet