Skip to content

Commit 1a5e460

Browse files
kiszkueshin
authored andcommitted
[SPARK-23913][SQL] Add array_intersect function
## What changes were proposed in this pull request? The PR adds the SQL function `array_intersect`. The behavior of the function is based on Presto's one. This function returns returns an array of the elements in the intersection of array1 and array2. Note: The order of elements in the result is not defined. ## How was this patch tested? Added UTs Author: Kazuaki Ishizaki <[email protected]> Closes apache#21102 from kiszk/SPARK-23913.
1 parent 35700bb commit 1a5e460

File tree

6 files changed

+515
-68
lines changed

6 files changed

+515
-68
lines changed

python/pyspark/sql/functions.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2033,6 +2033,25 @@ def array_distinct(col):
20332033
return Column(sc._jvm.functions.array_distinct(_to_java_column(col)))
20342034

20352035

2036+
@ignore_unicode_prefix
2037+
@since(2.4)
2038+
def array_intersect(col1, col2):
2039+
"""
2040+
Collection function: returns an array of the elements in the intersection of col1 and col2,
2041+
without duplicates.
2042+
2043+
:param col1: name of column containing array
2044+
:param col2: name of column containing array
2045+
2046+
>>> from pyspark.sql import Row
2047+
>>> df = spark.createDataFrame([Row(c1=["b", "a", "c"], c2=["c", "d", "a", "f"])])
2048+
>>> df.select(array_intersect(df.c1, df.c2)).collect()
2049+
[Row(array_intersect(c1, c2)=[u'a', u'c'])]
2050+
"""
2051+
sc = SparkContext._active_spark_context
2052+
return Column(sc._jvm.functions.array_intersect(_to_java_column(col1), _to_java_column(col2)))
2053+
2054+
20362055
@ignore_unicode_prefix
20372056
@since(2.4)
20382057
def array_union(col1, col2):

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -411,6 +411,7 @@ object FunctionRegistry {
411411
expression[CreateArray]("array"),
412412
expression[ArrayContains]("array_contains"),
413413
expression[ArraysOverlap]("arrays_overlap"),
414+
expression[ArrayIntersect]("array_intersect"),
414415
expression[ArrayJoin]("array_join"),
415416
expression[ArrayPosition]("array_position"),
416417
expression[ArraySort]("array_sort"),

0 commit comments

Comments
 (0)