Skip to content

Commit 571ce7c

Browse files
dsakumaRobert Kruszewski
authored andcommitted
[MINOR][DOC] Fix some typos and grammar issues
## What changes were proposed in this pull request? Easy fix in the documentation. ## How was this patch tested? N/A Closes apache#20948 Author: Daniel Sakuma <[email protected]> Closes apache#20928 from dsakuma/fix_typo_configuration_docs.
1 parent 1a16d33 commit 571ce7c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+107
-107
lines changed

docs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ here with the Spark source code. You can also find documentation specific to rel
55
Spark at http://spark.apache.org/documentation.html.
66

77
Read on to learn more about viewing documentation in plain text (i.e., markdown) or building the
8-
documentation yourself. Why build it yourself? So that you have the docs that corresponds to
8+
documentation yourself. Why build it yourself? So that you have the docs that correspond to
99
whichever version of Spark you currently have checked out of revision control.
1010

1111
## Prerequisites

docs/_plugins/include_example.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ def render(context)
4848
begin
4949
code = File.open(@file).read.encode("UTF-8")
5050
rescue => e
51-
# We need to explicitly exit on execptions here because Jekyll will silently swallow
51+
# We need to explicitly exit on exceptions here because Jekyll will silently swallow
5252
# them, leading to silent build failures (see https://github.com/jekyll/jekyll/issues/5104)
5353
puts(e)
5454
puts(e.backtrace)

docs/building-spark.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ Note: Flume support is deprecated as of Spark 2.3.0.
113113

114114
## Building submodules individually
115115

116-
It's possible to build Spark sub-modules using the `mvn -pl` option.
116+
It's possible to build Spark submodules using the `mvn -pl` option.
117117

118118
For instance, you can build the Spark Streaming module using:
119119

docs/cloud-integration.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,13 +27,13 @@ description: Introduction to cloud storage support in Apache Spark SPARK_VERSION
2727
All major cloud providers offer persistent data storage in *object stores*.
2828
These are not classic "POSIX" file systems.
2929
In order to store hundreds of petabytes of data without any single points of failure,
30-
object stores replace the classic filesystem directory tree
30+
object stores replace the classic file system directory tree
3131
with a simpler model of `object-name => data`. To enable remote access, operations
3232
on objects are usually offered as (slow) HTTP REST operations.
3333

3434
Spark can read and write data in object stores through filesystem connectors implemented
3535
in Hadoop or provided by the infrastructure suppliers themselves.
36-
These connectors make the object stores look *almost* like filesystems, with directories and files
36+
These connectors make the object stores look *almost* like file systems, with directories and files
3737
and the classic operations on them such as list, delete and rename.
3838

3939

docs/configuration.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -558,7 +558,7 @@ Apart from these, the following properties are also available, and may be useful
558558
<td>
559559
This configuration limits the number of remote requests to fetch blocks at any given point.
560560
When the number of hosts in the cluster increase, it might lead to very large number
561-
of in-bound connections to one or more nodes, causing the workers to fail under load.
561+
of inbound connections to one or more nodes, causing the workers to fail under load.
562562
By allowing it to limit the number of fetch requests, this scenario can be mitigated.
563563
</td>
564564
</tr>
@@ -1288,7 +1288,7 @@ Apart from these, the following properties are also available, and may be useful
12881288
<td>4194304 (4 MB)</td>
12891289
<td>
12901290
The estimated cost to open a file, measured by the number of bytes could be scanned at the same
1291-
time. This is used when putting multiple files into a partition. It is better to over estimate,
1291+
time. This is used when putting multiple files into a partition. It is better to overestimate,
12921292
then the partitions with small files will be faster than partitions with bigger files.
12931293
</td>
12941294
</tr>
@@ -1513,7 +1513,7 @@ Apart from these, the following properties are also available, and may be useful
15131513
<td>0.8 for KUBERNETES mode; 0.8 for YARN mode; 0.0 for standalone mode and Mesos coarse-grained mode</td>
15141514
<td>
15151515
The minimum ratio of registered resources (registered resources / total expected resources)
1516-
(resources are executors in yarn mode and Kubernetes mode, CPU cores in standalone mode and Mesos coarsed-grained
1516+
(resources are executors in yarn mode and Kubernetes mode, CPU cores in standalone mode and Mesos coarse-grained
15171517
mode ['spark.cores.max' value is total expected resources for Mesos coarse-grained mode] )
15181518
to wait for before scheduling begins. Specified as a double between 0.0 and 1.0.
15191519
Regardless of whether the minimum ratio of resources has been reached,
@@ -1634,7 +1634,7 @@ Apart from these, the following properties are also available, and may be useful
16341634
<td>false</td>
16351635
<td>
16361636
(Experimental) If set to "true", Spark will blacklist the executor immediately when a fetch
1637-
failure happenes. If external shuffle service is enabled, then the whole node will be
1637+
failure happens. If external shuffle service is enabled, then the whole node will be
16381638
blacklisted.
16391639
</td>
16401640
</tr>
@@ -1722,7 +1722,7 @@ Apart from these, the following properties are also available, and may be useful
17221722
When <code>spark.task.reaper.enabled = true</code>, this setting specifies a timeout after
17231723
which the executor JVM will kill itself if a killed task has not stopped running. The default
17241724
value, -1, disables this mechanism and prevents the executor from self-destructing. The purpose
1725-
of this setting is to act as a safety-net to prevent runaway uncancellable tasks from rendering
1725+
of this setting is to act as a safety-net to prevent runaway noncancellable tasks from rendering
17261726
an executor unusable.
17271727
</td>
17281728
</tr>
@@ -1915,8 +1915,8 @@ showDF(properties, numRows = 200, truncate = FALSE)
19151915
<td><code>spark.streaming.receiver.writeAheadLog.enable</code></td>
19161916
<td>false</td>
19171917
<td>
1918-
Enable write ahead logs for receivers. All the input data received through receivers
1919-
will be saved to write ahead logs that will allow it to be recovered after driver failures.
1918+
Enable write-ahead logs for receivers. All the input data received through receivers
1919+
will be saved to write-ahead logs that will allow it to be recovered after driver failures.
19201920
See the <a href="streaming-programming-guide.html#deploying-applications">deployment guide</a>
19211921
in the Spark Streaming programing guide for more details.
19221922
</td>
@@ -1971,7 +1971,7 @@ showDF(properties, numRows = 200, truncate = FALSE)
19711971
<td><code>spark.streaming.driver.writeAheadLog.closeFileAfterWrite</code></td>
19721972
<td>false</td>
19731973
<td>
1974-
Whether to close the file after writing a write ahead log record on the driver. Set this to 'true'
1974+
Whether to close the file after writing a write-ahead log record on the driver. Set this to 'true'
19751975
when you want to use S3 (or any file system that does not support flushing) for the metadata WAL
19761976
on the driver.
19771977
</td>
@@ -1980,7 +1980,7 @@ showDF(properties, numRows = 200, truncate = FALSE)
19801980
<td><code>spark.streaming.receiver.writeAheadLog.closeFileAfterWrite</code></td>
19811981
<td>false</td>
19821982
<td>
1983-
Whether to close the file after writing a write ahead log record on the receivers. Set this to 'true'
1983+
Whether to close the file after writing a write-ahead log record on the receivers. Set this to 'true'
19841984
when you want to use S3 (or any file system that does not support flushing) for the data WAL
19851985
on the receivers.
19861986
</td>
@@ -2178,7 +2178,7 @@ Spark's classpath for each application. In a Spark cluster running on YARN, thes
21782178
files are set cluster-wide, and cannot safely be changed by the application.
21792179

21802180
The better choice is to use spark hadoop properties in the form of `spark.hadoop.*`.
2181-
They can be considered as same as normal spark properties which can be set in `$SPARK_HOME/conf/spark-defalut.conf`
2181+
They can be considered as same as normal spark properties which can be set in `$SPARK_HOME/conf/spark-default.conf`
21822182

21832183
In some cases, you may want to avoid hard-coding certain configurations in a `SparkConf`. For
21842184
instance, Spark allows you to simply create an empty conf and set spark/spark hadoop properties.

docs/css/pygments-default.css

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ To generate this, I had to run
55
But first I had to install pygments via easy_install pygments
66
77
I had to override the conflicting bootstrap style rules by linking to
8-
this stylesheet lower in the html than the bootstap css.
8+
this stylesheet lower in the html than the bootstrap css.
99
1010
Also, I was thrown off for a while at first when I was using markdown
1111
code block inside my {% highlight scala %} ... {% endhighlight %} tags

docs/graphx-programming-guide.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -491,7 +491,7 @@ val joinedGraph = graph.joinVertices(uniqueCosts)(
491491
The more general [`outerJoinVertices`][Graph.outerJoinVertices] behaves similarly to `joinVertices`
492492
except that the user defined `map` function is applied to all vertices and can change the vertex
493493
property type. Because not all vertices may have a matching value in the input RDD the `map`
494-
function takes an `Option` type. For example, we can setup a graph for PageRank by initializing
494+
function takes an `Option` type. For example, we can set up a graph for PageRank by initializing
495495
vertex properties with their `outDegree`.
496496

497497

@@ -969,7 +969,7 @@ A vertex is part of a triangle when it has two adjacent vertices with an edge be
969969
# Examples
970970

971971
Suppose I want to build a graph from some text files, restrict the graph
972-
to important relationships and users, run page-rank on the sub-graph, and
972+
to important relationships and users, run page-rank on the subgraph, and
973973
then finally return attributes associated with the top users. I can do
974974
all of this in just a few lines with GraphX:
975975

docs/job-scheduling.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ run tasks and store data for that application. If multiple users need to share y
2323
different options to manage allocation, depending on the cluster manager.
2424

2525
The simplest option, available on all cluster managers, is _static partitioning_ of resources. With
26-
this approach, each application is given a maximum amount of resources it can use, and holds onto them
26+
this approach, each application is given a maximum amount of resources it can use and holds onto them
2727
for its whole duration. This is the approach used in Spark's [standalone](spark-standalone.html)
2828
and [YARN](running-on-yarn.html) modes, as well as the
2929
[coarse-grained Mesos mode](running-on-mesos.html#mesos-run-modes).
@@ -230,7 +230,7 @@ properties:
230230
* `minShare`: Apart from an overall weight, each pool can be given a _minimum shares_ (as a number of
231231
CPU cores) that the administrator would like it to have. The fair scheduler always attempts to meet
232232
all active pools' minimum shares before redistributing extra resources according to the weights.
233-
The `minShare` property can therefore be another way to ensure that a pool can always get up to a
233+
The `minShare` property can, therefore, be another way to ensure that a pool can always get up to a
234234
certain number of resources (e.g. 10 cores) quickly without giving it a high priority for the rest
235235
of the cluster. By default, each pool's `minShare` is 0.
236236

docs/ml-advanced.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ Quasi-Newton methods in this case. This fallback is currently always enabled for
7777
L1 regularization is applied (i.e. $\alpha = 0$), there exists an analytical solution and either Cholesky or Quasi-Newton solver may be used. When $\alpha > 0$ no analytical
7878
solution exists and we instead use the Quasi-Newton solver to find the coefficients iteratively.
7979

80-
In order to make the normal equation approach efficient, `WeightedLeastSquares` requires that the number of features be no more than 4096. For larger problems, use L-BFGS instead.
80+
In order to make the normal equation approach efficient, `WeightedLeastSquares` requires that the number of features is no more than 4096. For larger problems, use L-BFGS instead.
8181

8282
## Iteratively reweighted least squares (IRLS)
8383

docs/ml-classification-regression.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -420,7 +420,7 @@ Refer to the [R API docs](api/R/spark.svmLinear.html) for more details.
420420

421421
[OneVsRest](http://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest) is an example of a machine learning reduction for performing multiclass classification given a base classifier that can perform binary classification efficiently. It is also known as "One-vs-All."
422422

423-
`OneVsRest` is implemented as an `Estimator`. For the base classifier it takes instances of `Classifier` and creates a binary classification problem for each of the k classes. The classifier for class i is trained to predict whether the label is i or not, distinguishing class i from all other classes.
423+
`OneVsRest` is implemented as an `Estimator`. For the base classifier, it takes instances of `Classifier` and creates a binary classification problem for each of the k classes. The classifier for class i is trained to predict whether the label is i or not, distinguishing class i from all other classes.
424424

425425
Predictions are done by evaluating each binary classifier and the index of the most confident classifier is output as label.
426426

@@ -908,7 +908,7 @@ Refer to the [R API docs](api/R/spark.survreg.html) for more details.
908908
belongs to the family of regression algorithms. Formally isotonic regression is a problem where
909909
given a finite set of real numbers `$Y = {y_1, y_2, ..., y_n}$` representing observed responses
910910
and `$X = {x_1, x_2, ..., x_n}$` the unknown response values to be fitted
911-
finding a function that minimises
911+
finding a function that minimizes
912912

913913
`\begin{equation}
914914
f(x) = \sum_{i=1}^n w_i (y_i - x_i)^2
@@ -927,7 +927,7 @@ We implement a
927927
which uses an approach to
928928
[parallelizing isotonic regression](http://doi.org/10.1007/978-3-642-99789-1_10).
929929
The training input is a DataFrame which contains three columns
930-
label, features and weight. Additionally IsotonicRegression algorithm has one
930+
label, features and weight. Additionally, IsotonicRegression algorithm has one
931931
optional parameter called $isotonic$ defaulting to true.
932932
This argument specifies if the isotonic regression is
933933
isotonic (monotonically increasing) or antitonic (monotonically decreasing).

0 commit comments

Comments
 (0)