Skip to content

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Apr 12, 2025

What changes were proposed in this pull request?

This PR aims to support xml API in DataFrameReader and DataFrameWriter.

Why are the changes needed?

xml API is newly added at Apache Spark 4.0.0. We had better support this for the feature parity.

https://github.com/apache/spark/blob/e0801d9d8e33cd8835f3e3beed99a3588c16b776/sql/api/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L394-L403

  /**
   * Loads a XML file and returns the result as a `DataFrame`. See the documentation on the other
   * overloaded `xml()` method for more details.
   *
   * @since 4.0.0
   */
  def xml(path: String): DataFrame = {
    // This method ensures that calls that explicit need single argument works, see SPARK-16009
    xml(Seq(path): _*)
  }

Does this PR introduce any user-facing change?

No, this is a new addition.

How was this patch tested?

Pass the CIs.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya . Merged to main.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-51784 branch April 12, 2025 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants