Skip to content

Commit c5daccb

Browse files
katrinleinwebersrowen
authored andcommitted
[MINOR] Update all DOI links to preferred resolver
## What changes were proposed in this pull request? The DOI foundation recommends [this new resolver](https://www.doi.org/doi_handbook/3_Resolution.html#3.8). Accordingly, this PR re`sed`s all static DOI links ;-) ## How was this patch tested? It wasn't, since it seems as safe as a "[typo fix](https://spark.apache.org/contributing.html)". In case any of the files is included from other projects, and should be updated there, please let me know. Closes apache#23129 from katrinleinweber/resolve-DOIs-securely. Authored-by: Katrin Leinweber <[email protected]> Signed-off-by: Sean Owen <[email protected]>
1 parent 41d5aae commit c5daccb

File tree

29 files changed

+54
-54
lines changed

29 files changed

+54
-54
lines changed

R/pkg/R/stats.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ setMethod("corr",
109109
#'
110110
#' Finding frequent items for columns, possibly with false positives.
111111
#' Using the frequent element count algorithm described in
112-
#' \url{http://dx.doi.org/10.1145/762471.762473}, proposed by Karp, Schenker, and Papadimitriou.
112+
#' \url{https://doi.org/10.1145/762471.762473}, proposed by Karp, Schenker, and Papadimitriou.
113113
#'
114114
#' @param x A SparkDataFrame.
115115
#' @param cols A vector column names to search frequent items in.
@@ -143,7 +143,7 @@ setMethod("freqItems", signature(x = "SparkDataFrame", cols = "character"),
143143
#' *exact* rank of x is close to (p * N). More precisely,
144144
#' floor((p - err) * N) <= rank(x) <= ceil((p + err) * N).
145145
#' This method implements a variation of the Greenwald-Khanna algorithm (with some speed
146-
#' optimizations). The algorithm was first present in [[http://dx.doi.org/10.1145/375663.375670
146+
#' optimizations). The algorithm was first present in [[https://doi.org/10.1145/375663.375670
147147
#' Space-efficient Online Computation of Quantile Summaries]] by Greenwald and Khanna.
148148
#' Note that NA values will be ignored in numerical columns before calculation. For
149149
#' columns only containing NA values, an empty list is returned.

core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -952,7 +952,7 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
952952
*
953953
* The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice:
954954
* Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available
955-
* <a href="http://dx.doi.org/10.1145/2452376.2452456">here</a>.
955+
* <a href="https://doi.org/10.1145/2452376.2452456">here</a>.
956956
*
957957
* @param relativeSD Relative accuracy. Smaller values create counters that require more space.
958958
* It must be greater than 0.000017.
@@ -969,7 +969,7 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
969969
*
970970
* The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice:
971971
* Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available
972-
* <a href="http://dx.doi.org/10.1145/2452376.2452456">here</a>.
972+
* <a href="https://doi.org/10.1145/2452376.2452456">here</a>.
973973
*
974974
* @param relativeSD Relative accuracy. Smaller values create counters that require more space.
975975
* It must be greater than 0.000017.
@@ -985,7 +985,7 @@ class JavaPairRDD[K, V](val rdd: RDD[(K, V)])
985985
*
986986
* The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice:
987987
* Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available
988-
* <a href="http://dx.doi.org/10.1145/2452376.2452456">here</a>.
988+
* <a href="https://doi.org/10.1145/2452376.2452456">here</a>.
989989
*
990990
* @param relativeSD Relative accuracy. Smaller values create counters that require more space.
991991
* It must be greater than 0.000017.

core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -685,7 +685,7 @@ trait JavaRDDLike[T, This <: JavaRDDLike[T, This]] extends Serializable {
685685
*
686686
* The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice:
687687
* Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available
688-
* <a href="http://dx.doi.org/10.1145/2452376.2452456">here</a>.
688+
* <a href="https://doi.org/10.1145/2452376.2452456">here</a>.
689689
*
690690
* @param relativeSD Relative accuracy. Smaller values create counters that require more space.
691691
* It must be greater than 0.000017.

core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -394,7 +394,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
394394
*
395395
* The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice:
396396
* Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available
397-
* <a href="http://dx.doi.org/10.1145/2452376.2452456">here</a>.
397+
* <a href="https://doi.org/10.1145/2452376.2452456">here</a>.
398398
*
399399
* The relative accuracy is approximately `1.054 / sqrt(2^p)`. Setting a nonzero (`sp` is
400400
* greater than `p`) would trigger sparse representation of registers, which may reduce the
@@ -436,7 +436,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
436436
*
437437
* The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice:
438438
* Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available
439-
* <a href="http://dx.doi.org/10.1145/2452376.2452456">here</a>.
439+
* <a href="https://doi.org/10.1145/2452376.2452456">here</a>.
440440
*
441441
* @param relativeSD Relative accuracy. Smaller values create counters that require more space.
442442
* It must be greater than 0.000017.
@@ -456,7 +456,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
456456
*
457457
* The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice:
458458
* Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available
459-
* <a href="http://dx.doi.org/10.1145/2452376.2452456">here</a>.
459+
* <a href="https://doi.org/10.1145/2452376.2452456">here</a>.
460460
*
461461
* @param relativeSD Relative accuracy. Smaller values create counters that require more space.
462462
* It must be greater than 0.000017.
@@ -473,7 +473,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
473473
*
474474
* The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice:
475475
* Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available
476-
* <a href="http://dx.doi.org/10.1145/2452376.2452456">here</a>.
476+
* <a href="https://doi.org/10.1145/2452376.2452456">here</a>.
477477
*
478478
* @param relativeSD Relative accuracy. Smaller values create counters that require more space.
479479
* It must be greater than 0.000017.

core/src/main/scala/org/apache/spark/rdd/RDD.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1258,7 +1258,7 @@ abstract class RDD[T: ClassTag](
12581258
*
12591259
* The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice:
12601260
* Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available
1261-
* <a href="http://dx.doi.org/10.1145/2452376.2452456">here</a>.
1261+
* <a href="https://doi.org/10.1145/2452376.2452456">here</a>.
12621262
*
12631263
* The relative accuracy is approximately `1.054 / sqrt(2^p)`. Setting a nonzero (`sp` is greater
12641264
* than `p`) would trigger sparse representation of registers, which may reduce the memory
@@ -1290,7 +1290,7 @@ abstract class RDD[T: ClassTag](
12901290
*
12911291
* The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice:
12921292
* Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available
1293-
* <a href="http://dx.doi.org/10.1145/2452376.2452456">here</a>.
1293+
* <a href="https://doi.org/10.1145/2452376.2452456">here</a>.
12941294
*
12951295
* @param relativeSD Relative accuracy. Smaller values create counters that require more space.
12961296
* It must be greater than 0.000017.

docs/ml-classification-regression.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -941,9 +941,9 @@ Essentially isotonic regression is a
941941
best fitting the original data points.
942942

943943
We implement a
944-
[pool adjacent violators algorithm](http://doi.org/10.1198/TECH.2010.10111)
944+
[pool adjacent violators algorithm](https://doi.org/10.1198/TECH.2010.10111)
945945
which uses an approach to
946-
[parallelizing isotonic regression](http://doi.org/10.1007/978-3-642-99789-1_10).
946+
[parallelizing isotonic regression](https://doi.org/10.1007/978-3-642-99789-1_10).
947947
The training input is a DataFrame which contains three columns
948948
label, features and weight. Additionally, IsotonicRegression algorithm has one
949949
optional parameter called $isotonic$ defaulting to true.

docs/ml-collaborative-filtering.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ for example, users giving ratings to movies.
4141

4242
It is common in many real-world use cases to only have access to *implicit feedback* (e.g. views,
4343
clicks, purchases, likes, shares etc.). The approach used in `spark.ml` to deal with such data is taken
44-
from [Collaborative Filtering for Implicit Feedback Datasets](http://dx.doi.org/10.1109/ICDM.2008.22).
44+
from [Collaborative Filtering for Implicit Feedback Datasets](https://doi.org/10.1109/ICDM.2008.22).
4545
Essentially, instead of trying to model the matrix of ratings directly, this approach treats the data
4646
as numbers representing the *strength* in observations of user actions (such as the number of clicks,
4747
or the cumulative duration someone spent viewing a movie). Those numbers are then related to the level of
@@ -55,7 +55,7 @@ We scale the regularization parameter `regParam` in solving each least squares p
5555
the number of ratings the user generated in updating user factors,
5656
or the number of ratings the product received in updating product factors.
5757
This approach is named "ALS-WR" and discussed in the paper
58-
"[Large-Scale Parallel Collaborative Filtering for the Netflix Prize](http://dx.doi.org/10.1007/978-3-540-68880-8_32)".
58+
"[Large-Scale Parallel Collaborative Filtering for the Netflix Prize](https://doi.org/10.1007/978-3-540-68880-8_32)".
5959
It makes `regParam` less dependent on the scale of the dataset, so we can apply the
6060
best parameter learned from a sampled subset to the full dataset and expect similar performance.
6161

docs/ml-frequent-pattern-mining.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,15 +18,15 @@ for more information.
1818
## FP-Growth
1919

2020
The FP-growth algorithm is described in the paper
21-
[Han et al., Mining frequent patterns without candidate generation](http://dx.doi.org/10.1145/335191.335372),
21+
[Han et al., Mining frequent patterns without candidate generation](https://doi.org/10.1145/335191.335372),
2222
where "FP" stands for frequent pattern.
2323
Given a dataset of transactions, the first step of FP-growth is to calculate item frequencies and identify frequent items.
2424
Different from [Apriori-like](http://en.wikipedia.org/wiki/Apriori_algorithm) algorithms designed for the same purpose,
2525
the second step of FP-growth uses a suffix tree (FP-tree) structure to encode transactions without generating candidate sets
2626
explicitly, which are usually expensive to generate.
2727
After the second step, the frequent itemsets can be extracted from the FP-tree.
2828
In `spark.mllib`, we implemented a parallel version of FP-growth called PFP,
29-
as described in [Li et al., PFP: Parallel FP-growth for query recommendation](http://dx.doi.org/10.1145/1454008.1454027).
29+
as described in [Li et al., PFP: Parallel FP-growth for query recommendation](https://doi.org/10.1145/1454008.1454027).
3030
PFP distributes the work of growing FP-trees based on the suffixes of transactions,
3131
and hence is more scalable than a single-machine implementation.
3232
We refer users to the papers for more details.
@@ -90,7 +90,7 @@ Refer to the [R API docs](api/R/spark.fpGrowth.html) for more details.
9090

9191
PrefixSpan is a sequential pattern mining algorithm described in
9292
[Pei et al., Mining Sequential Patterns by Pattern-Growth: The
93-
PrefixSpan Approach](http://dx.doi.org/10.1109%2FTKDE.2004.77). We refer
93+
PrefixSpan Approach](https://doi.org/10.1109%2FTKDE.2004.77). We refer
9494
the reader to the referenced paper for formalizing the sequential
9595
pattern mining problem.
9696

@@ -137,4 +137,4 @@ Refer to the [R API docs](api/R/spark.prefixSpan.html) for more details.
137137
{% include_example r/ml/prefixSpan.R %}
138138
</div>
139139

140-
</div>
140+
</div>

docs/mllib-collaborative-filtering.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ for example, users giving ratings to movies.
3737

3838
It is common in many real-world use cases to only have access to *implicit feedback* (e.g. views,
3939
clicks, purchases, likes, shares etc.). The approach used in `spark.mllib` to deal with such data is taken
40-
from [Collaborative Filtering for Implicit Feedback Datasets](http://dx.doi.org/10.1109/ICDM.2008.22).
40+
from [Collaborative Filtering for Implicit Feedback Datasets](https://doi.org/10.1109/ICDM.2008.22).
4141
Essentially, instead of trying to model the matrix of ratings directly, this approach treats the data
4242
as numbers representing the *strength* in observations of user actions (such as the number of clicks,
4343
or the cumulative duration someone spent viewing a movie). Those numbers are then related to the level of
@@ -51,7 +51,7 @@ Since v1.1, we scale the regularization parameter `lambda` in solving each least
5151
the number of ratings the user generated in updating user factors,
5252
or the number of ratings the product received in updating product factors.
5353
This approach is named "ALS-WR" and discussed in the paper
54-
"[Large-Scale Parallel Collaborative Filtering for the Netflix Prize](http://dx.doi.org/10.1007/978-3-540-68880-8_32)".
54+
"[Large-Scale Parallel Collaborative Filtering for the Netflix Prize](https://doi.org/10.1007/978-3-540-68880-8_32)".
5555
It makes `lambda` less dependent on the scale of the dataset, so we can apply the
5656
best parameter learned from a sampled subset to the full dataset and expect similar performance.
5757

docs/mllib-frequent-pattern-mining.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,15 +15,15 @@ a popular algorithm to mining frequent itemsets.
1515
## FP-growth
1616

1717
The FP-growth algorithm is described in the paper
18-
[Han et al., Mining frequent patterns without candidate generation](http://dx.doi.org/10.1145/335191.335372),
18+
[Han et al., Mining frequent patterns without candidate generation](https://doi.org/10.1145/335191.335372),
1919
where "FP" stands for frequent pattern.
2020
Given a dataset of transactions, the first step of FP-growth is to calculate item frequencies and identify frequent items.
2121
Different from [Apriori-like](http://en.wikipedia.org/wiki/Apriori_algorithm) algorithms designed for the same purpose,
2222
the second step of FP-growth uses a suffix tree (FP-tree) structure to encode transactions without generating candidate sets
2323
explicitly, which are usually expensive to generate.
2424
After the second step, the frequent itemsets can be extracted from the FP-tree.
2525
In `spark.mllib`, we implemented a parallel version of FP-growth called PFP,
26-
as described in [Li et al., PFP: Parallel FP-growth for query recommendation](http://dx.doi.org/10.1145/1454008.1454027).
26+
as described in [Li et al., PFP: Parallel FP-growth for query recommendation](https://doi.org/10.1145/1454008.1454027).
2727
PFP distributes the work of growing FP-trees based on the suffixes of transactions,
2828
and hence more scalable than a single-machine implementation.
2929
We refer users to the papers for more details.
@@ -122,7 +122,7 @@ Refer to the [`AssociationRules` Java docs](api/java/org/apache/spark/mllib/fpm/
122122

123123
PrefixSpan is a sequential pattern mining algorithm described in
124124
[Pei et al., Mining Sequential Patterns by Pattern-Growth: The
125-
PrefixSpan Approach](http://dx.doi.org/10.1109%2FTKDE.2004.77). We refer
125+
PrefixSpan Approach](https://doi.org/10.1109%2FTKDE.2004.77). We refer
126126
the reader to the referenced paper for formalizing the sequential
127127
pattern mining problem.
128128

0 commit comments

Comments
 (0)