Skip to content

Commit 16b2b27

Browse files
gatorsmileRobert Kruszewski
authored andcommitted
[SPARK-25908][SQL][FOLLOW-UP] Add back unionAll
This PR is to add back `unionAll`, which is widely used. The name is also consistent with our ANSI SQL. We also have the corresponding `intersectAll` and `exceptAll`, which were introduced in Spark 2.4. Added a test case in DataFrameSuite Closes apache#23131 from gatorsmile/addBackUnionAll. Authored-by: gatorsmile <[email protected]> Signed-off-by: gatorsmile <[email protected]>
1 parent 8a48906 commit 16b2b27

File tree

9 files changed

+42
-18
lines changed

9 files changed

+42
-18
lines changed

R/pkg/NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,7 @@ exportMethods("arrange",
169169
"toJSON",
170170
"transform",
171171
"union",
172+
"unionAll",
172173
"unionByName",
173174
"unique",
174175
"unpersist",

R/pkg/R/DataFrame.R

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2724,6 +2724,20 @@ setMethod("union",
27242724
dataFrame(unioned)
27252725
})
27262726

2727+
#' Return a new SparkDataFrame containing the union of rows
2728+
#'
2729+
#' This is an alias for `union`.
2730+
#'
2731+
#' @rdname union
2732+
#' @name unionAll
2733+
#' @aliases unionAll,SparkDataFrame,SparkDataFrame-method
2734+
#' @note unionAll since 1.4.0
2735+
setMethod("unionAll",
2736+
signature(x = "SparkDataFrame", y = "SparkDataFrame"),
2737+
function(x, y) {
2738+
union(x, y)
2739+
})
2740+
27272741
#' Return a new SparkDataFrame containing the union of rows, matched by column names
27282742
#'
27292743
#' Return a new SparkDataFrame containing the union of rows in this SparkDataFrame

R/pkg/R/generics.R

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -631,6 +631,9 @@ setGeneric("toRDD", function(x) { standardGeneric("toRDD") })
631631
#' @rdname union
632632
setGeneric("union", function(x, y) { standardGeneric("union") })
633633

634+
#' @rdname union
635+
setGeneric("unionAll", function(x, y) { standardGeneric("unionAll") })
636+
634637
#' @rdname unionByName
635638
setGeneric("unionByName", function(x, y) { standardGeneric("unionByName") })
636639

R/pkg/tests/fulltests/test_sparkSQL.R

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2453,6 +2453,7 @@ test_that("union(), unionByName(), rbind(), except(), and intersect() on a DataF
24532453
expect_equal(count(unioned), 6)
24542454
expect_equal(first(unioned)$name, "Michael")
24552455
expect_equal(count(arrange(suppressWarnings(union(df, df2)), df$age)), 6)
2456+
expect_equal(count(arrange(suppressWarnings(unionAll(df, df2)), df$age)), 6)
24562457

24572458
df1 <- select(df2, "age", "name")
24582459
unioned1 <- arrange(unionByName(df1, df), df1$age)

docs/sparkr.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -718,4 +718,4 @@ You can inspect the search path in R with [`search()`](https://stat.ethz.ch/R-ma
718718
## Upgrading to SparkR 3.0.0
719719

720720
- The deprecated methods `sparkR.init`, `sparkRSQL.init`, `sparkRHive.init` have been removed. Use `sparkR.session` instead.
721-
- The deprecated methods `parquetFile`, `saveAsParquetFile`, `jsonFile`, `registerTempTable`, `createExternalTable`, `dropTempTable`, `unionAll` have been removed. Use `read.parquet`, `write.parquet`, `read.json`, `createOrReplaceTempView`, `createTable`, `dropTempView`, `union` instead.
721+
- The deprecated methods `parquetFile`, `saveAsParquetFile`, `jsonFile`, `registerTempTable`, `createExternalTable`, and `dropTempTable` have been removed. Use `read.parquet`, `write.parquet`, `read.json`, `createOrReplaceTempView`, `createTable`, `dropTempView`, `union` instead.

docs/sql-migration-guide-upgrade.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ displayTitle: Spark SQL Upgrading Guide
99

1010
## Upgrading From Spark SQL 2.4 to 3.0
1111

12+
- Since Spark 3.0, the Dataset and DataFrame API `unionAll` is not deprecated any more. It is an alias for `union`.
13+
1214
- In PySpark, when creating a `SparkSession` with `SparkSession.builder.getOrCreate()`, if there is an existing `SparkContext`, the builder was trying to update the `SparkConf` of the existing `SparkContext` with configurations specified to the builder, but the `SparkContext` is shared by all `SparkSession`s, so we should not update them. Since 3.0, the builder comes to not update the configurations. This is the same behavior as Java/Scala API in 2.3 and above. If you want to update them, you need to update them prior to creating a `SparkSession`.
1315

1416
- In Spark version 2.4 and earlier, the parser of JSON data source treats empty strings as null for some data types such as `IntegerType`. For `FloatType` and `DoubleType`, it fails on empty strings and throws exceptions. Since Spark 3.0, we disallow empty strings and will throw exceptions for data types except for `StringType` and `BinaryType`.

python/pyspark/sql/dataframe.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1470,10 +1470,7 @@ def unionAll(self, other):
14701470
(that does deduplication of elements), use this function followed by :func:`distinct`.
14711471
14721472
Also as standard in SQL, this function resolves columns by position (not by name).
1473-
1474-
.. note:: Deprecated in 2.0, use :func:`union` instead.
14751473
"""
1476-
warnings.warn("Deprecated in 2.0, use union instead.", DeprecationWarning)
14771474
return self.union(other)
14781475

14791476
@since(2.3)

sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1810,20 +1810,6 @@ class Dataset[T] private[sql](
18101810
Limit(Literal(n), logicalPlan)
18111811
}
18121812

1813-
/**
1814-
* Returns a new Dataset containing union of rows in this Dataset and another Dataset.
1815-
*
1816-
* This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union (that does
1817-
* deduplication of elements), use this function followed by a [[distinct]].
1818-
*
1819-
* Also as standard in SQL, this function resolves columns by position (not by name).
1820-
*
1821-
* @group typedrel
1822-
* @since 2.0.0
1823-
*/
1824-
@deprecated("use union()", "2.0.0")
1825-
def unionAll(other: Dataset[T]): Dataset[T] = union(other)
1826-
18271813
/**
18281814
* Returns a new Dataset containing union of rows in this Dataset and another Dataset.
18291815
*
@@ -1860,6 +1846,20 @@ class Dataset[T] private[sql](
18601846
CombineUnions(Union(logicalPlan, other.logicalPlan))
18611847
}
18621848

1849+
/**
1850+
* Returns a new Dataset containing union of rows in this Dataset and another Dataset.
1851+
* This is an alias for `union`.
1852+
*
1853+
* This is equivalent to `UNION ALL` in SQL. To do a SQL-style set union (that does
1854+
* deduplication of elements), use this function followed by a [[distinct]].
1855+
*
1856+
* Also as standard in SQL, this function resolves columns by position (not by name).
1857+
*
1858+
* @group typedrel
1859+
* @since 2.0.0
1860+
*/
1861+
def unionAll(other: Dataset[T]): Dataset[T] = union(other)
1862+
18631863
/**
18641864
* Returns a new Dataset containing union of rows in this Dataset and another Dataset.
18651865
*

sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,12 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
9797
unionDF.agg(avg('key), max('key), min('key), sum('key)),
9898
Row(50.5, 100, 1, 25250) :: Nil
9999
)
100+
101+
// unionAll is an alias of union
102+
val unionAllDF = testData.unionAll(testData).unionAll(testData)
103+
.unionAll(testData).unionAll(testData)
104+
105+
checkAnswer(unionDF, unionAllDF)
100106
}
101107

102108
test("union should union DataFrames with UDTs (SPARK-13410)") {

0 commit comments

Comments
 (0)