Skip to content

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Apr 2, 2025

What changes were proposed in this pull request?

This PR aims to support DataFrame.storageLevel.

Why are the changes needed?

For feature parity.

Does this PR introduce any user-facing change?

No. This is a new addition to the unreleased version.

How was this patch tested?

Pass the CIs.

$ swift test --filter DataFrameTests.storageLevel
􀟈  Suite DataFrameTests started.
􀟈  Test storageLevel() started.
􁁛  Test storageLevel() passed after 0.075 seconds.
􁁛  Suite DataFrameTests passed after 0.075 seconds.
􁁛  Test run with 1 test passed after 0.075 seconds.

Was this patch authored or co-authored using generative AI tooling?

No.

return self
}

var storageLevel: StorageLevel {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this is var instead of a function? E.g., in Dataset.scala, it is def storageLevel: StorageLevel.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Apr 2, 2025

Thank you, @viirya . Yes, it is not a function in order to follow the style of existing clients like the following.

scala> spark.range(1).storageLevel
Using Spark's default log4j profile: org/apache/spark/log4j2-pattern-layout-defaults.properties
res0: org.apache.spark.storage.StorageLevel = StorageLevel(1 replicas)

scala> spark.range(1).storageLevel()
cmd1.sc:1: org.apache.spark.storage.StorageLevel does not take parameters
val res1 = spark.range(1).storageLevel()
                                      ^
Compilation Failed
$ bin/pyspark --remote sc://localhost:15002
Python 3.9.21 (main, Dec  5 2024, 09:43:42)
[Clang 16.0.0 (clang-1600.0.26.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 4.0.0
      /_/

Using Python version 3.9.21 (main, Dec  5 2024 09:43:42)
Client connected to the Spark Connect server at localhost:15002
SparkSession available as 'spark'.
>>> spark.range(1).storageLevel
StorageLevel(False, False, False, False, 1)
>>> spark.range(1).storageLevel()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'StorageLevel' object is not callable

@dongjoon-hyun
Copy link
Member Author

Merged to main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants