Skip to content

Commit 74dbd37

Browse files
author
Sumedh Wale
committed
[SNAPPYDATA] honor existing PYSPARK_PYTHON in build
- more fixes to URL references in sparkr-vignettes and others - updated copyright to 2022 in UI information
1 parent 1fb3673 commit 74dbd37

File tree

6 files changed

+30
-27
lines changed

6 files changed

+30
-27
lines changed

R/pkg/DESCRIPTION

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
1111
email = "[email protected]"),
1212
person(family = "The Apache Software Foundation", role = c("aut", "cph")))
1313
License: Apache License (== 2.0)
14-
URL: http://www.apache.org/ http://spark.apache.org/
15-
BugReports: http://spark.apache.org/contributing.html
14+
URL: https://www.apache.org/ https://spark.apache.org/
15+
BugReports: https://spark.apache.org/contributing.html
1616
Depends:
1717
R (>= 3.0),
1818
methods

R/pkg/vignettes/sparkr-vignettes.Rmd

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ Sys.setenv("_JAVA_OPTIONS" = paste("-XX:-UsePerfData", old_java_opt, sep = " "))
4646

4747
## Overview
4848

49-
SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. With Spark `r packageVersion("SparkR")`, SparkR provides a distributed data frame implementation that supports data processing operations like selection, filtering, aggregation etc. and distributed machine learning using [MLlib](http://spark.apache.org/mllib/).
49+
SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. With Spark `r packageVersion("SparkR")`, SparkR provides a distributed data frame implementation that supports data processing operations like selection, filtering, aggregation etc. and distributed machine learning using [MLlib](https://spark.apache.org/mllib/).
5050

5151
## Getting Started
5252

@@ -132,7 +132,7 @@ sparkR.session.stop()
132132

133133
Different from many other R packages, to use SparkR, you need an additional installation of Apache Spark. The Spark installation will be used to run a backend process that will compile and execute SparkR programs.
134134

135-
After installing the SparkR package, you can call `sparkR.session` as explained in the previous section to start and it will check for the Spark installation. If you are working with SparkR from an interactive shell (eg. R, RStudio) then Spark is downloaded and cached automatically if it is not found. Alternatively, we provide an easy-to-use function `install.spark` for running this manually. If you don't have Spark installed on the computer, you may download it from [Apache Spark Website](http://spark.apache.org/downloads.html).
135+
After installing the SparkR package, you can call `sparkR.session` as explained in the previous section to start and it will check for the Spark installation. If you are working with SparkR from an interactive shell (eg. R, RStudio) then Spark is downloaded and cached automatically if it is not found. Alternatively, we provide an easy-to-use function `install.spark` for running this manually. If you don't have Spark installed on the computer, you may download it from [Apache Spark Website](https://spark.apache.org/downloads.html).
136136

137137
```{r, eval=FALSE}
138138
install.spark()
@@ -147,7 +147,7 @@ sparkR.session(sparkHome = "/HOME/spark")
147147
### Spark Session {#SetupSparkSession}
148148

149149

150-
In addition to `sparkHome`, many other options can be specified in `sparkR.session`. For a complete list, see [Starting up: SparkSession](http://spark.apache.org/docs/latest/sparkr.html#starting-up-sparksession) and [SparkR API doc](http://spark.apache.org/docs/latest/api/R/sparkR.session.html).
150+
In addition to `sparkHome`, many other options can be specified in `sparkR.session`. For a complete list, see [Starting up: SparkSession](https://spark.apache.org/docs/latest/sparkr.html#starting-up-sparksession) and [SparkR API doc](https://spark.apache.org/docs/2.1.3/api/R/sparkR.session.html).
151151

152152
In particular, the following Spark driver properties can be set in `sparkConfig`.
153153

@@ -169,15 +169,15 @@ sparkR.session(spark.sql.warehouse.dir = spark_warehouse_path)
169169

170170

171171
#### Cluster Mode
172-
SparkR can connect to remote Spark clusters. [Cluster Mode Overview](http://spark.apache.org/docs/latest/cluster-overview.html) is a good introduction to different Spark cluster modes.
172+
SparkR can connect to remote Spark clusters. [Cluster Mode Overview](https://spark.apache.org/docs/latest/cluster-overview.html) is a good introduction to different Spark cluster modes.
173173

174174
When connecting SparkR to a remote Spark cluster, make sure that the Spark version and Hadoop version on the machine match the corresponding versions on the cluster. Current SparkR package is compatible with
175175
```{r, echo=FALSE, tidy = TRUE}
176176
paste("Spark", packageVersion("SparkR"))
177177
```
178178
It should be used both on the local computer and on the remote cluster.
179179

180-
To connect, pass the URL of the master node to `sparkR.session`. A complete list can be seen in [Spark Master URLs](http://spark.apache.org/docs/latest/submitting-applications.html#master-urls).
180+
To connect, pass the URL of the master node to `sparkR.session`. A complete list can be seen in [Spark Master URLs](https://spark.apache.org/docs/latest/submitting-applications.html#master-urls).
181181
For example, to connect to a local standalone Spark master, we can call
182182

183183
```{r, eval=FALSE}
@@ -208,7 +208,7 @@ The general method for creating `SparkDataFrame` from data sources is `read.df`.
208208
sparkR.session(sparkPackages = "com.databricks:spark-avro_2.11:3.0.0")
209209
```
210210

211-
We can see how to use data sources using an example CSV input file. For more information please refer to SparkR [read.df](https://spark.apache.org/docs/latest/api/R/read.df.html) API documentation.
211+
We can see how to use data sources using an example CSV input file. For more information please refer to SparkR [read.df](https://spark.apache.org/docs/2.1.3/api/R/read.df.html) API documentation.
212212
```{r, eval=FALSE}
213213
df <- read.df(csvPath, "csv", header = "true", inferSchema = "true", na.strings = "NA")
214214
```
@@ -297,7 +297,7 @@ printSchema(carsDF)
297297

298298
#### Selecting rows, columns
299299

300-
SparkDataFrames support a number of functions to do structured data processing. Here we include some basic examples and a complete list can be found in the [API](https://spark.apache.org/docs/latest/api/R/index.html) docs:
300+
SparkDataFrames support a number of functions to do structured data processing. Here we include some basic examples and a complete list can be found in the [API](https://spark.apache.org/docs/2.1.3/api/R/index.html) docs:
301301

302302
You can also pass in column name as strings.
303303
```{r}
@@ -842,7 +842,7 @@ perplexity
842842

843843
#### Alternating Least Squares
844844

845-
`spark.als` learns latent factors in [collaborative filtering](https://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering) via [alternating least squares](http://dl.acm.org/citation.cfm?id=1608614).
845+
`spark.als` learns latent factors in [collaborative filtering](https://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering) via [alternating least squares](https://dl.acm.org/doi/10.1109/MC.2009.263).
846846

847847
There are multiple options that can be configured in `spark.als`, including `rank`, `reg`, and `nonnegative`. For a complete list, refer to the help file.
848848

@@ -979,11 +979,11 @@ env | map
979979

980980
## References
981981

982-
* [Spark Cluster Mode Overview](http://spark.apache.org/docs/latest/cluster-overview.html)
982+
* [Spark Cluster Mode Overview](https://spark.apache.org/docs/latest/cluster-overview.html)
983983

984-
* [Submitting Spark Applications](http://spark.apache.org/docs/latest/submitting-applications.html)
984+
* [Submitting Spark Applications](https://spark.apache.org/docs/latest/submitting-applications.html)
985985

986-
* [Machine Learning Library Guide (MLlib)](http://spark.apache.org/docs/latest/ml-guide.html)
986+
* [Machine Learning Library Guide (MLlib)](https://spark.apache.org/docs/latest/ml-guide.html)
987987

988988
* [SparkR: Scaling R Programs with Spark](https://people.csail.mit.edu/matei/papers/2016/sigmod_sparkr.pdf), Shivaram Venkataraman, Zongheng Yang, Davies Liu, Eric Liang, Hossein Falaki, Xiangrui Meng, Reynold Xin, Ali Ghodsi, Michael Franklin, Ion Stoica, and Matei Zaharia. SIGMOD 2016. June 2016.
989989

build.gradle

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -280,18 +280,21 @@ allprojects {
280280
}
281281

282282
// set python2 for pyspark if python3 version is an unsupported one
283-
String sparkPython = 'python'
284-
def checkResult = exec {
285-
ignoreExitValue = true
286-
commandLine 'sh', '-c', 'python --version 2>/dev/null | grep -Eq "( 3\\.[0-7])|( 2\\.)"'
287-
}
288-
if (checkResult.exitValue != 0) {
289-
checkResult = exec {
283+
String sparkPython = System.getenv('PYSPARK_PYTHON')
284+
if (sparkPython == null || sparkPython.isEmpty()) {
285+
sparkPython = 'python'
286+
def checkResult = exec {
290287
ignoreExitValue = true
291-
commandLine 'sh', '-c', 'python2 --version >/dev/null 2>&1'
288+
commandLine 'sh', '-c', 'python --version 2>/dev/null | grep -Eq "( 3\\.[0-7])|( 2\\.)"'
292289
}
293-
if (checkResult.exitValue == 0) {
294-
sparkPython = 'python2'
290+
if (checkResult.exitValue != 0) {
291+
checkResult = exec {
292+
ignoreExitValue = true
293+
commandLine 'sh', '-c', 'python2 --version >/dev/null 2>&1'
294+
}
295+
if (checkResult.exitValue == 0) {
296+
sparkPython = 'python2'
297+
}
295298
}
296299
}
297300

core/src/main/scala/org/apache/spark/ui/UIUtils.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -636,7 +636,7 @@ private[spark] object UIUtils extends Logging {
636636
<p>
637637
<strong>Project SnappyData<sup>&trade;</sup>
638638
- Enterprise Edition</strong> <br />
639-
<br />&copy; 2017-2020 TIBCO<sup>&reg;</sup> Software Inc. All rights reserved.
639+
<br />&copy; 2017-2022 TIBCO<sup>&reg;</sup> Software Inc. All rights reserved.
640640
<br />This program is protected by copyright law.
641641
</p>
642642
<p>
@@ -659,7 +659,7 @@ private[spark] object UIUtils extends Logging {
659659
} else {
660660
<p>
661661
<strong>Project SnappyData<sup>&trade;</sup> - Community Edition </strong> <br />
662-
<br />&copy; 2017-2020 TIBCO<sup>&reg;</sup> Software Inc. All rights reserved.
662+
<br />&copy; 2017-2022 TIBCO<sup>&reg;</sup> Software Inc. All rights reserved.
663663
<br />This program is protected by copyright law.
664664
</p>
665665
<p>

docs/ml-collaborative-filtering.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ missing entries of a user-item association matrix. `spark.ml` currently support
1515
model-based collaborative filtering, in which users and products are described
1616
by a small set of latent factors that can be used to predict missing entries.
1717
`spark.ml` uses the [alternating least squares
18-
(ALS)](http://dl.acm.org/citation.cfm?id=1608614)
18+
(ALS)](https://dl.acm.org/doi/10.1109/MC.2009.263)
1919
algorithm to learn these latent factors. The implementation in `spark.ml` has the
2020
following parameters:
2121

docs/mllib-collaborative-filtering.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ missing entries of a user-item association matrix. `spark.mllib` currently supp
1515
model-based collaborative filtering, in which users and products are described
1616
by a small set of latent factors that can be used to predict missing entries.
1717
`spark.mllib` uses the [alternating least squares
18-
(ALS)](http://dl.acm.org/citation.cfm?id=1608614)
18+
(ALS)](https://dl.acm.org/doi/10.1109/MC.2009.263)
1919
algorithm to learn these latent factors. The implementation in `spark.mllib` has the
2020
following parameters:
2121

0 commit comments

Comments
 (0)