Skip to content

Commit 25c5d57

Browse files
tools4originsdongjoon-hyun
authored andcommitted
[MINOR][DOC] Fix python variance() documentation
## What changes were proposed in this pull request? The Python documentation incorrectly says that `variance()` acts as `var_pop` whereas it acts like `var_samp` here: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.variance It was not the case in Spark 1.6 doc but it is in Spark 2.0 doc: https://spark.apache.org/docs/1.6.0/api/java/org/apache/spark/sql/functions.html https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/functions.html The Scala documentation is correct: https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/functions.html#variance-org.apache.spark.sql.Column- The alias is set on this line: https://github.com/apache/spark/blob/v2.4.3/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L786 ## How was this patch tested? Using variance() in pyspark 2.4.3 returns: ``` >>> spark.createDataFrame([(1, ), (2, ), (3, )], "a: int").select(variance("a")).show() +-----------+ |var_samp(a)| +-----------+ | 1.0| +-----------+ ``` Closes apache#24895 from tools4origins/patch-1. Authored-by: tools4origins <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent ea0e119 commit 25c5d57

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

python/pyspark/sql/functions.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -209,14 +209,14 @@ def _():
209209
"""
210210
_functions_1_6_over_column = {
211211
# unary math functions
212-
'stddev': 'Aggregate function: returns the unbiased sample standard deviation of' +
213-
' the expression in a group.',
212+
'stddev': 'Aggregate function: alias for stddev_samp.',
214213
'stddev_samp': 'Aggregate function: returns the unbiased sample standard deviation of' +
215214
' the expression in a group.',
216215
'stddev_pop': 'Aggregate function: returns population standard deviation of' +
217216
' the expression in a group.',
218-
'variance': 'Aggregate function: returns the population variance of the values in a group.',
219-
'var_samp': 'Aggregate function: returns the unbiased variance of the values in a group.',
217+
'variance': 'Aggregate function: alias for var_samp.',
218+
'var_samp': 'Aggregate function: returns the unbiased sample variance of' +
219+
' the values in a group.',
220220
'var_pop': 'Aggregate function: returns the population variance of the values in a group.',
221221
'skewness': 'Aggregate function: returns the skewness of the values in a group.',
222222
'kurtosis': 'Aggregate function: returns the kurtosis of the values in a group.',

0 commit comments

Comments
 (0)