Skip to content

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented May 11, 2025

What changes were proposed in this pull request?

This PR aims to support DataStreamReader and DataStreamWriter.

Why are the changes needed?

For feature parity.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs with the newly added test case.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

Could you review this too, @viirya ?

@viirya
Copy link
Member

viirya commented May 11, 2025

Yea


/// Specifies the input data source format.
/// - Parameter source: A string.
/// - Returns: A ``DataStreamReader``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// - Returns: A ``DataStreamReader``.
/// - Returns: A `DataStreamReader`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just following other methods in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe other methods are incorrect but this is correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are correct~ A single backquote is for codifying. A double-backquote is for codifying and linking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I guess so. Just checked if a consistent style could be better.

/// Define a Streaming DataFrame on a Table. The DataSource corresponding to the table should
/// support streaming mode.
/// - Parameter tableName: The name of the table.
/// - Returns: A ``DataFrame``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// - Returns: A ``DataFrame``.
/// - Returns: A `DataFrame`.

/// started with `start()`. This name must be unique among all the currently active queries in
/// the associated SparkSession.
/// - Parameter queryName: A string name.
/// - Returns: A ``DataStreamWriter``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// - Returns: A ``DataStreamWriter``.
/// - Returns: A `DataStreamWriter`.

/// aggregations, it will be equivalent to `append` mode.
///
/// - Parameter outputMode: A string for outputMode.
/// - Returns: A ``DataStreamWriter``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// - Returns: A ``DataStreamWriter``.
/// - Returns: A `DataStreamWriter`.

/// Partitions the output by the given columns on the file system. If specified, the output is
/// laid out on the file system similar to Hive's partitioning scheme.
/// - Parameter colNames: Column names to partition.
/// - Returns: A ``DataStreamWriter``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// - Returns: A ``DataStreamWriter``.
/// - Returns: A `DataStreamWriter`.

/// given path as new data arrives. The returned ``StreamingQuery`` object can be used to interact
/// with the stream.
/// - Parameter path: A path to write.
/// - Returns: A ``StreamingQuery``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// - Returns: A ``StreamingQuery``.
/// - Returns: A `StreamingQuery`.

/// given table as new data arrives. The returned ``StreamingQuery`` object can be used to interact
/// with the stream.
/// - Parameter tableName: A table name.
/// - Returns: A ``StreamingQuery``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// - Returns: A ``StreamingQuery``.
/// - Returns: A `StreamingQuery`.

/// - The SQL configuration `spark.sql.streaming.stopActiveRunOnRestart` is enabled
/// - The active run cannot be stopped within the timeout controlled by the SQL configuration `spark.sql.streaming.stopTimeout`
///
/// - Returns: A ``StreamingQuery``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// - Returns: A ``StreamingQuery``.
/// - Returns: A `StreamingQuery`.

@dongjoon-hyun
Copy link
Member Author

Thank you for review. You are right for the inconsistency for single and double backticks. I'm trying to use double backticks more to add a link to the class.

@dongjoon-hyun
Copy link
Member Author

Let me merge this~ For inconsistency, I'll try to do clean-up later for the rest of them.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-52069 branch May 11, 2025 18:43
@viirya
Copy link
Member

viirya commented May 11, 2025

Got it. Thank you @dongjoon-hyun

@dongjoon-hyun
Copy link
Member Author

Thank you always~ 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants