Skip to content

Commit 6f62a0d

Browse files
authored
Merge pull request apache-spark-on-k8s#476 from palantir/rk/more-merge
Merge from upstream
2 parents a51fa9c + a23eb35 commit 6f62a0d

File tree

771 files changed

+27757
-19523
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

771 files changed

+27757
-19523
lines changed

.circleci/config.yml

Lines changed: 8 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ version: 2
22

33
defaults: &defaults
44
docker:
5-
- image: palantirtechnologies/circle-spark-base:0.1.0
5+
- image: palantirtechnologies/circle-spark-base:0.1.3
66
resource_class: xlarge
77
environment: &defaults-environment
88
TERM: dumb
@@ -128,7 +128,7 @@ jobs:
128128
<<: *defaults
129129
# Some part of the maven setup fails if there's no R, so we need to use the R image here
130130
docker:
131-
- image: palantirtechnologies/circle-spark-r:0.1.0
131+
- image: palantirtechnologies/circle-spark-r:0.1.3
132132
steps:
133133
# Saves us from recompiling every time...
134134
- restore_cache:
@@ -147,12 +147,7 @@ jobs:
147147
keys:
148148
- build-binaries-{{ checksum "build/mvn" }}-{{ checksum "build/sbt" }}
149149
- build-binaries-
150-
- run: |
151-
./build/mvn -T1C -DskipTests -Phadoop-cloud -Phadoop-palantir -Pkinesis-asl -Pkubernetes -Pyarn -Psparkr install \
152-
| tee -a "/tmp/mvn-install.log"
153-
- store_artifacts:
154-
path: /tmp/mvn-install.log
155-
destination: mvn-install.log
150+
- run: ./build/mvn -DskipTests -Phadoop-cloud -Phadoop-palantir -Pkinesis-asl -Pkubernetes -Pyarn -Psparkr install
156151
# Get sbt to run trivially, ensures its launcher is downloaded under build/
157152
- run: ./build/sbt -h || true
158153
- save_cache:
@@ -300,7 +295,7 @@ jobs:
300295
# depends on build-sbt, but we only need the assembly jars
301296
<<: *defaults
302297
docker:
303-
- image: palantirtechnologies/circle-spark-python:0.1.0
298+
- image: palantirtechnologies/circle-spark-python:0.1.3
304299
parallelism: 2
305300
steps:
306301
- *checkout-code
@@ -325,7 +320,7 @@ jobs:
325320
# depends on build-sbt, but we only need the assembly jars
326321
<<: *defaults
327322
docker:
328-
- image: palantirtechnologies/circle-spark-r:0.1.0
323+
- image: palantirtechnologies/circle-spark-r:0.1.3
329324
steps:
330325
- *checkout-code
331326
- attach_workspace:
@@ -438,7 +433,7 @@ jobs:
438433
<<: *defaults
439434
# Some part of the maven setup fails if there's no R, so we need to use the R image here
440435
docker:
441-
- image: palantirtechnologies/circle-spark-r:0.1.0
436+
- image: palantirtechnologies/circle-spark-r:0.1.3
442437
steps:
443438
- *checkout-code
444439
- restore_cache:
@@ -458,7 +453,7 @@ jobs:
458453
deploy-gradle:
459454
<<: *defaults
460455
docker:
461-
- image: palantirtechnologies/circle-spark-r:0.1.0
456+
- image: palantirtechnologies/circle-spark-r:0.1.3
462457
steps:
463458
- *checkout-code
464459
- *restore-gradle-wrapper-cache
@@ -470,7 +465,7 @@ jobs:
470465
<<: *defaults
471466
# Some part of the maven setup fails if there's no R, so we need to use the R image here
472467
docker:
473-
- image: palantirtechnologies/circle-spark-r:0.1.0
468+
- image: palantirtechnologies/circle-spark-r:0.1.3
474469
steps:
475470
# This cache contains the whole project after version was set and mvn package was called
476471
# Restoring first (and instead of checkout) as mvn versions:set mutates real source code...

.gitignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,6 @@ work/
8181
.credentials
8282
dev/pr-deps
8383
docs/.jekyll-metadata
84-
*.crc
8584

8685
# For Hive
8786
TempStatsStore/

FORK.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,5 @@
2929
# Reverted
3030
* [SPARK-25908](https://issues.apache.org/jira/browse/SPARK-25908) - Removal of `monotonicall_increasing_id`, `toDegree`, `toRadians`, `approxCountDistinct`, `unionAll`
3131
* [SPARK-25862](https://issues.apache.org/jira/browse/SPARK-25862) - Removal of `unboundedPreceding`, `unboundedFollowing`, `currentRow`
32+
* [SPARK-26127](https://issues.apache.org/jira/browse/SPARK-26127) - Removal of deprecated setters from tree regression and classification models
33+
* [SPARK-25867](https://issues.apache.org/jira/browse/SPARK-25867) - Removal of KMeans computeCost

R/WINDOWS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
To build SparkR on Windows, the following steps are required
44

55
1. Install R (>= 3.1) and [Rtools](http://cran.r-project.org/bin/windows/Rtools/). Make sure to
6-
include Rtools and R in `PATH`.
6+
include Rtools and R in `PATH`. Note that support for R prior to version 3.4 is deprecated as of Spark 3.0.0.
77

88
2. Install
99
[JDK8](http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html) and set

R/pkg/DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ URL: http://www.apache.org/ http://spark.apache.org/
1515
BugReports: http://spark.apache.org/contributing.html
1616
SystemRequirements: Java (== 8)
1717
Depends:
18-
R (>= 3.0),
18+
R (>= 3.1),
1919
methods
2020
Suggests:
2121
knitr,

R/pkg/R/DataFrame.R

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -767,6 +767,14 @@ setMethod("repartition",
767767
#' using \code{spark.sql.shuffle.partitions} as number of partitions.}
768768
#'}
769769
#'
770+
#' At least one partition-by expression must be specified.
771+
#' When no explicit sort order is specified, "ascending nulls first" is assumed.
772+
#'
773+
#' Note that due to performance reasons this method uses sampling to estimate the ranges.
774+
#' Hence, the output may not be consistent, since sampling can return different values.
775+
#' The sample size can be controlled by the config
776+
#' \code{spark.sql.execution.rangeExchange.sampleSizePerPartition}.
777+
#'
770778
#' @param x a SparkDataFrame.
771779
#' @param numPartitions the number of partitions to use.
772780
#' @param col the column by which the range partitioning will be performed.

R/pkg/R/functions.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3370,7 +3370,7 @@ setMethod("flatten",
33703370
#'
33713371
#' @rdname column_collection_functions
33723372
#' @aliases map_entries map_entries,Column-method
3373-
#' @note map_entries since 2.4.0
3373+
#' @note map_entries since 3.0.0
33743374
setMethod("map_entries",
33753375
signature(x = "Column"),
33763376
function(x) {

R/pkg/R/stats.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ setMethod("corr",
109109
#'
110110
#' Finding frequent items for columns, possibly with false positives.
111111
#' Using the frequent element count algorithm described in
112-
#' \url{http://dx.doi.org/10.1145/762471.762473}, proposed by Karp, Schenker, and Papadimitriou.
112+
#' \url{https://doi.org/10.1145/762471.762473}, proposed by Karp, Schenker, and Papadimitriou.
113113
#'
114114
#' @param x A SparkDataFrame.
115115
#' @param cols A vector column names to search frequent items in.
@@ -143,7 +143,7 @@ setMethod("freqItems", signature(x = "SparkDataFrame", cols = "character"),
143143
#' *exact* rank of x is close to (p * N). More precisely,
144144
#' floor((p - err) * N) <= rank(x) <= ceil((p + err) * N).
145145
#' This method implements a variation of the Greenwald-Khanna algorithm (with some speed
146-
#' optimizations). The algorithm was first present in [[http://dx.doi.org/10.1145/375663.375670
146+
#' optimizations). The algorithm was first present in [[https://doi.org/10.1145/375663.375670
147147
#' Space-efficient Online Computation of Quantile Summaries]] by Greenwald and Khanna.
148148
#' Note that NA values will be ignored in numerical columns before calculation. For
149149
#' columns only containing NA values, an empty list is returned.

R/pkg/inst/profile/general.R

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,10 @@
1616
#
1717

1818
.First <- function() {
19+
if (utils::compareVersion(paste0(R.version$major, ".", R.version$minor), "3.4.0") == -1) {
20+
warning("Support for R prior to version 3.4 is deprecated since Spark 3.0.0")
21+
}
22+
1923
packageDir <- Sys.getenv("SPARKR_PACKAGE_DIR")
2024
dirs <- strsplit(packageDir, ",")[[1]]
2125
.libPaths(c(dirs, .libPaths()))

R/pkg/inst/profile/shell.R

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,10 @@
1616
#
1717

1818
.First <- function() {
19+
if (utils::compareVersion(paste0(R.version$major, ".", R.version$minor), "3.4.0") == -1) {
20+
warning("Support for R prior to version 3.4 is deprecated since Spark 3.0.0")
21+
}
22+
1923
home <- Sys.getenv("SPARK_HOME")
2024
.libPaths(c(file.path(home, "R", "lib"), .libPaths()))
2125
Sys.setenv(NOAWT = 1)

0 commit comments

Comments
 (0)