Skip to content

Commit f8dd15f

Browse files
authored
Merge pull request apache-spark-on-k8s#402 from palantir/rk/new-master
Merge upstream
2 parents 11651f9 + 2ddcdab commit f8dd15f

File tree

676 files changed

+19326
-6221
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

676 files changed

+19326
-6221
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ work/
8181
.credentials
8282
dev/pr-deps
8383
docs/.jekyll-metadata
84+
*.crc
8485

8586
# For Hive
8687
TempStatsStore/

LICENSE-binary

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,6 @@ org.apache.xbean:xbean-asm5-shaded
228228
com.squareup.okhttp3:logging-interceptor
229229
com.squareup.okhttp3:okhttp
230230
com.squareup.okio:okio
231-
net.java.dev.jets3t:jets3t
232231
org.apache.spark:spark-catalyst_2.11
233232
org.apache.spark:spark-kvstore_2.11
234233
org.apache.spark:spark-launcher_2.11
@@ -447,7 +446,6 @@ org.slf4j:jul-to-slf4j
447446
org.slf4j:slf4j-api
448447
org.slf4j:slf4j-log4j12
449448
com.github.scopt:scopt_2.11
450-
org.bouncycastle:bcprov-jdk15on
451449

452450
core/src/main/resources/org/apache/spark/ui/static/dagre-d3.min.js
453451
core/src/main/resources/org/apache/spark/ui/static/*dataTables*

NOTICE

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,25 @@ Copyright 2014 and onwards The Apache Software Foundation.
44
This product includes software developed at
55
The Apache Software Foundation (http://www.apache.org/).
66

7+
8+
Export Control Notice
9+
---------------------
10+
11+
This distribution includes cryptographic software. The country in which you currently reside may have
12+
restrictions on the import, possession, use, and/or re-export to another country, of encryption software.
13+
BEFORE using any encryption software, please check your country's laws, regulations and policies concerning
14+
the import, possession, or use, and re-export of encryption software, to see if this is permitted. See
15+
<http://www.wassenaar.org/> for more information.
16+
17+
The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this
18+
software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software
19+
using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache
20+
Software Foundation distribution makes it eligible for export under the License Exception ENC Technology
21+
Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for
22+
both object code and source code.
23+
24+
The following provides more details on the included cryptographic software:
25+
26+
This software uses Apache Commons Crypto (https://commons.apache.org/proper/commons-crypto/) to
27+
support authentication, and encryption and decryption of data sent across the network between
28+
services.

NOTICE-binary

Lines changed: 25 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,29 @@ This product includes software developed at
55
The Apache Software Foundation (http://www.apache.org/).
66

77

8+
Export Control Notice
9+
---------------------
10+
11+
This distribution includes cryptographic software. The country in which you currently reside may have
12+
restrictions on the import, possession, use, and/or re-export to another country, of encryption software.
13+
BEFORE using any encryption software, please check your country's laws, regulations and policies concerning
14+
the import, possession, or use, and re-export of encryption software, to see if this is permitted. See
15+
<http://www.wassenaar.org/> for more information.
16+
17+
The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this
18+
software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software
19+
using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache
20+
Software Foundation distribution makes it eligible for export under the License Exception ENC Technology
21+
Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for
22+
both object code and source code.
23+
24+
The following provides more details on the included cryptographic software:
25+
26+
This software uses Apache Commons Crypto (https://commons.apache.org/proper/commons-crypto/) to
27+
support authentication, and encryption and decryption of data sent across the network between
28+
services.
29+
30+
831
// ------------------------------------------------------------------
932
// NOTICE file corresponding to the section 4d of The Apache License,
1033
// Version 2.0, in this case for
@@ -451,7 +474,7 @@ which has the following notices:
451474
PureJavaCrc32C from apache-hadoop-common http://hadoop.apache.org/
452475
(Apache 2.0 license)
453476

454-
This library containd statically linked libstdc++. This inclusion is allowed by
477+
This library contains statically linked libstdc++. This inclusion is allowed by
455478
"GCC RUntime Library Exception"
456479
http://gcc.gnu.org/onlinedocs/libstdc++/manual/license.html
457480

@@ -1137,25 +1160,6 @@ NonlinearMinimizer class in package breeze.optimize.proximal is distributed with
11371160
2015, Debasish Das (Verizon), all rights reserved.
11381161

11391162

1140-
=========================================================================
1141-
== NOTICE file corresponding to section 4(d) of the Apache License, ==
1142-
== Version 2.0, in this case for the distribution of jets3t. ==
1143-
=========================================================================
1144-
1145-
This product includes software developed by:
1146-
1147-
The Apache Software Foundation (http://www.apache.org/).
1148-
1149-
The ExoLab Project (http://www.exolab.org/)
1150-
1151-
Sun Microsystems (http://www.sun.com/)
1152-
1153-
Codehaus (http://castor.codehaus.org)
1154-
1155-
Tatu Saloranta (http://wiki.fasterxml.com/TatuSaloranta)
1156-
1157-
1158-
11591163
stream-lib
11601164
Copyright 2016 AddThis
11611165

@@ -1167,4 +1171,4 @@ Apache Solr (http://lucene.apache.org/solr/)
11671171
Copyright 2014 The Apache Software Foundation
11681172

11691173
Apache Mahout (http://mahout.apache.org/)
1170-
Copyright 2014 The Apache Software Foundation
1174+
Copyright 2014 The Apache Software Foundation

R/pkg/NAMESPACE

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,7 @@ exportMethods("arrange",
117117
"dropna",
118118
"dtypes",
119119
"except",
120+
"exceptAll",
120121
"explain",
121122
"fillna",
122123
"filter",
@@ -131,6 +132,7 @@ exportMethods("arrange",
131132
"hint",
132133
"insertInto",
133134
"intersect",
135+
"intersectAll",
134136
"isLocal",
135137
"isStreaming",
136138
"join",
@@ -202,6 +204,8 @@ exportMethods("%<=>%",
202204
"approxQuantile",
203205
"array_contains",
204206
"array_distinct",
207+
"array_except",
208+
"array_intersect",
205209
"array_join",
206210
"array_max",
207211
"array_min",
@@ -210,6 +214,7 @@ exportMethods("%<=>%",
210214
"array_repeat",
211215
"array_sort",
212216
"arrays_overlap",
217+
"array_union",
213218
"arrays_zip",
214219
"asc",
215220
"ascii",
@@ -353,6 +358,7 @@ exportMethods("%<=>%",
353358
"shiftLeft",
354359
"shiftRight",
355360
"shiftRightUnsigned",
361+
"shuffle",
356362
"sd",
357363
"sign",
358364
"signum",

R/pkg/R/DataFrame.R

Lines changed: 59 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -588,7 +588,7 @@ setMethod("cache",
588588
#' \url{http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence}.
589589
#'
590590
#' @param x the SparkDataFrame to persist.
591-
#' @param newLevel storage level chosen for the persistance. See available options in
591+
#' @param newLevel storage level chosen for the persistence. See available options in
592592
#' the description.
593593
#'
594594
#' @family SparkDataFrame functions
@@ -2848,6 +2848,35 @@ setMethod("intersect",
28482848
dataFrame(intersected)
28492849
})
28502850

2851+
#' intersectAll
2852+
#'
2853+
#' Return a new SparkDataFrame containing rows in both this SparkDataFrame
2854+
#' and another SparkDataFrame while preserving the duplicates.
2855+
#' This is equivalent to \code{INTERSECT ALL} in SQL. Also as standard in
2856+
#' SQL, this function resolves columns by position (not by name).
2857+
#'
2858+
#' @param x a SparkDataFrame.
2859+
#' @param y a SparkDataFrame.
2860+
#' @return A SparkDataFrame containing the result of the intersect all operation.
2861+
#' @family SparkDataFrame functions
2862+
#' @aliases intersectAll,SparkDataFrame,SparkDataFrame-method
2863+
#' @rdname intersectAll
2864+
#' @name intersectAll
2865+
#' @examples
2866+
#'\dontrun{
2867+
#' sparkR.session()
2868+
#' df1 <- read.json(path)
2869+
#' df2 <- read.json(path2)
2870+
#' intersectAllDF <- intersectAll(df1, df2)
2871+
#' }
2872+
#' @note intersectAll since 2.4.0
2873+
setMethod("intersectAll",
2874+
signature(x = "SparkDataFrame", y = "SparkDataFrame"),
2875+
function(x, y) {
2876+
intersected <- callJMethod(x@sdf, "intersectAll", y@sdf)
2877+
dataFrame(intersected)
2878+
})
2879+
28512880
#' except
28522881
#'
28532882
#' Return a new SparkDataFrame containing rows in this SparkDataFrame
@@ -2867,7 +2896,6 @@ setMethod("intersect",
28672896
#' df2 <- read.json(path2)
28682897
#' exceptDF <- except(df, df2)
28692898
#' }
2870-
#' @rdname except
28712899
#' @note except since 1.4.0
28722900
setMethod("except",
28732901
signature(x = "SparkDataFrame", y = "SparkDataFrame"),
@@ -2876,6 +2904,35 @@ setMethod("except",
28762904
dataFrame(excepted)
28772905
})
28782906

2907+
#' exceptAll
2908+
#'
2909+
#' Return a new SparkDataFrame containing rows in this SparkDataFrame
2910+
#' but not in another SparkDataFrame while preserving the duplicates.
2911+
#' This is equivalent to \code{EXCEPT ALL} in SQL. Also as standard in
2912+
#' SQL, this function resolves columns by position (not by name).
2913+
#'
2914+
#' @param x a SparkDataFrame.
2915+
#' @param y a SparkDataFrame.
2916+
#' @return A SparkDataFrame containing the result of the except all operation.
2917+
#' @family SparkDataFrame functions
2918+
#' @aliases exceptAll,SparkDataFrame,SparkDataFrame-method
2919+
#' @rdname exceptAll
2920+
#' @name exceptAll
2921+
#' @examples
2922+
#'\dontrun{
2923+
#' sparkR.session()
2924+
#' df1 <- read.json(path)
2925+
#' df2 <- read.json(path2)
2926+
#' exceptAllDF <- exceptAll(df1, df2)
2927+
#' }
2928+
#' @note exceptAll since 2.4.0
2929+
setMethod("exceptAll",
2930+
signature(x = "SparkDataFrame", y = "SparkDataFrame"),
2931+
function(x, y) {
2932+
excepted <- callJMethod(x@sdf, "exceptAll", y@sdf)
2933+
dataFrame(excepted)
2934+
})
2935+
28792936
#' Save the contents of SparkDataFrame to a data source.
28802937
#'
28812938
#' The data source is specified by the \code{source} and a set of options (...).

R/pkg/R/SQLContext.R

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -351,7 +351,7 @@ setMethod("toDF", signature(x = "RDD"),
351351
read.json.default <- function(path, ...) {
352352
sparkSession <- getSparkSession()
353353
options <- varargsToStrEnv(...)
354-
# Allow the user to have a more flexible definiton of the text file path
354+
# Allow the user to have a more flexible definition of the text file path
355355
paths <- as.list(suppressWarnings(normalizePath(path)))
356356
read <- callJMethod(sparkSession, "read")
357357
read <- callJMethod(read, "options", options)
@@ -421,7 +421,7 @@ jsonRDD <- function(sqlContext, rdd, schema = NULL, samplingRatio = 1.0) {
421421
read.orc <- function(path, ...) {
422422
sparkSession <- getSparkSession()
423423
options <- varargsToStrEnv(...)
424-
# Allow the user to have a more flexible definiton of the ORC file path
424+
# Allow the user to have a more flexible definition of the ORC file path
425425
path <- suppressWarnings(normalizePath(path))
426426
read <- callJMethod(sparkSession, "read")
427427
read <- callJMethod(read, "options", options)
@@ -442,7 +442,7 @@ read.orc <- function(path, ...) {
442442
read.parquet.default <- function(path, ...) {
443443
sparkSession <- getSparkSession()
444444
options <- varargsToStrEnv(...)
445-
# Allow the user to have a more flexible definiton of the Parquet file path
445+
# Allow the user to have a more flexible definition of the Parquet file path
446446
paths <- as.list(suppressWarnings(normalizePath(path)))
447447
read <- callJMethod(sparkSession, "read")
448448
read <- callJMethod(read, "options", options)
@@ -492,7 +492,7 @@ parquetFile <- function(x, ...) {
492492
read.text.default <- function(path, ...) {
493493
sparkSession <- getSparkSession()
494494
options <- varargsToStrEnv(...)
495-
# Allow the user to have a more flexible definiton of the text file path
495+
# Allow the user to have a more flexible definition of the text file path
496496
paths <- as.list(suppressWarnings(normalizePath(path)))
497497
read <- callJMethod(sparkSession, "read")
498498
read <- callJMethod(read, "options", options)

R/pkg/R/context.R

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ getMinPartitions <- function(sc, minPartitions) {
4343
#' lines <- textFile(sc, "myfile.txt")
4444
#'}
4545
textFile <- function(sc, path, minPartitions = NULL) {
46-
# Allow the user to have a more flexible definiton of the text file path
46+
# Allow the user to have a more flexible definition of the text file path
4747
path <- suppressWarnings(normalizePath(path))
4848
# Convert a string vector of paths to a string containing comma separated paths
4949
path <- paste(path, collapse = ",")
@@ -71,7 +71,7 @@ textFile <- function(sc, path, minPartitions = NULL) {
7171
#' rdd <- objectFile(sc, "myfile")
7272
#'}
7373
objectFile <- function(sc, path, minPartitions = NULL) {
74-
# Allow the user to have a more flexible definiton of the text file path
74+
# Allow the user to have a more flexible definition of the text file path
7575
path <- suppressWarnings(normalizePath(path))
7676
# Convert a string vector of paths to a string containing comma separated paths
7777
path <- paste(path, collapse = ",")
@@ -138,11 +138,10 @@ parallelize <- function(sc, coll, numSlices = 1) {
138138

139139
sizeLimit <- getMaxAllocationLimit(sc)
140140
objectSize <- object.size(coll)
141+
len <- length(coll)
141142

142143
# For large objects we make sure the size of each slice is also smaller than sizeLimit
143-
numSerializedSlices <- max(numSlices, ceiling(objectSize / sizeLimit))
144-
if (numSerializedSlices > length(coll))
145-
numSerializedSlices <- length(coll)
144+
numSerializedSlices <- min(len, max(numSlices, ceiling(objectSize / sizeLimit)))
146145

147146
# Generate the slice ids to put each row
148147
# For instance, for numSerializedSlices of 22, length of 50
@@ -153,8 +152,8 @@ parallelize <- function(sc, coll, numSlices = 1) {
153152
splits <- if (numSerializedSlices > 0) {
154153
unlist(lapply(0: (numSerializedSlices - 1), function(x) {
155154
# nolint start
156-
start <- trunc((x * length(coll)) / numSerializedSlices)
157-
end <- trunc(((x + 1) * length(coll)) / numSerializedSlices)
155+
start <- trunc((as.numeric(x) * len) / numSerializedSlices)
156+
end <- trunc(((as.numeric(x) + 1) * len) / numSerializedSlices)
158157
# nolint end
159158
rep(start, end - start)
160159
}))

0 commit comments

Comments
 (0)