-
Notifications
You must be signed in to change notification settings - Fork 6
[SPARK-52069] Support DataStreamReader
and DataStreamWriter
#126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d68db21
to
8e623c7
Compare
3069a80
to
41d252c
Compare
Could you review this too, @viirya ? |
Yea |
|
||
/// Specifies the input data source format. | ||
/// - Parameter source: A string. | ||
/// - Returns: A ``DataStreamReader``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// - Returns: A ``DataStreamReader``. | |
/// - Returns: A `DataStreamReader`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just following other methods in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe other methods are incorrect but this is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are correct~ A single backquote is for codifying. A double-backquote is for codifying and linking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I guess so. Just checked if a consistent style could be better.
/// Define a Streaming DataFrame on a Table. The DataSource corresponding to the table should | ||
/// support streaming mode. | ||
/// - Parameter tableName: The name of the table. | ||
/// - Returns: A ``DataFrame``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// - Returns: A ``DataFrame``. | |
/// - Returns: A `DataFrame`. |
/// started with `start()`. This name must be unique among all the currently active queries in | ||
/// the associated SparkSession. | ||
/// - Parameter queryName: A string name. | ||
/// - Returns: A ``DataStreamWriter``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// - Returns: A ``DataStreamWriter``. | |
/// - Returns: A `DataStreamWriter`. |
/// aggregations, it will be equivalent to `append` mode. | ||
/// | ||
/// - Parameter outputMode: A string for outputMode. | ||
/// - Returns: A ``DataStreamWriter``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// - Returns: A ``DataStreamWriter``. | |
/// - Returns: A `DataStreamWriter`. |
/// Partitions the output by the given columns on the file system. If specified, the output is | ||
/// laid out on the file system similar to Hive's partitioning scheme. | ||
/// - Parameter colNames: Column names to partition. | ||
/// - Returns: A ``DataStreamWriter``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// - Returns: A ``DataStreamWriter``. | |
/// - Returns: A `DataStreamWriter`. |
/// given path as new data arrives. The returned ``StreamingQuery`` object can be used to interact | ||
/// with the stream. | ||
/// - Parameter path: A path to write. | ||
/// - Returns: A ``StreamingQuery``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// - Returns: A ``StreamingQuery``. | |
/// - Returns: A `StreamingQuery`. |
/// given table as new data arrives. The returned ``StreamingQuery`` object can be used to interact | ||
/// with the stream. | ||
/// - Parameter tableName: A table name. | ||
/// - Returns: A ``StreamingQuery``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// - Returns: A ``StreamingQuery``. | |
/// - Returns: A `StreamingQuery`. |
/// - The SQL configuration `spark.sql.streaming.stopActiveRunOnRestart` is enabled | ||
/// - The active run cannot be stopped within the timeout controlled by the SQL configuration `spark.sql.streaming.stopTimeout` | ||
/// | ||
/// - Returns: A ``StreamingQuery``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// - Returns: A ``StreamingQuery``. | |
/// - Returns: A `StreamingQuery`. |
Thank you for review. You are right for the inconsistency for single and double backticks. I'm trying to use |
Let me merge this~ For inconsistency, I'll try to do clean-up later for the rest of them. |
Got it. Thank you @dongjoon-hyun |
Thank you always~ 😄 |
What changes were proposed in this pull request?
This PR aims to support
DataStreamReader
andDataStreamWriter
.Why are the changes needed?
For feature parity.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass the CIs with the newly added test case.
Was this patch authored or co-authored using generative AI tooling?
No.