Skip to content
This repository was archived by the owner on Jan 9, 2020. It is now read-only.

Commit 7b50736

Browse files
assafmendelsonzsxwing
authored andcommitted
[SPARK-21123][DOCS][STRUCTURED STREAMING] Options for file stream source are in a wrong table
## What changes were proposed in this pull request? The description for several options of File Source for structured streaming appeared in the File Sink description instead. This pull request has two commits: The first includes changes to the version as it appeared in spark 2.1 and the second handled an additional option added for spark 2.2 ## How was this patch tested? Built the documentation by SKIP_API=1 jekyll build and visually inspected the structured streaming programming guide. The original documentation was written by tdas and lw-lin Author: assafmendelson <[email protected]> Closes apache#18342 from assafmendelson/spark-21123. (cherry picked from commit 66a792c) Signed-off-by: Shixiong Zhu <[email protected]>
1 parent f7fcdec commit 7b50736

File tree

1 file changed

+15
-13
lines changed

1 file changed

+15
-13
lines changed

docs/structured-streaming-programming-guide.md

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -510,7 +510,20 @@ Here are the details of all the sources in Spark.
510510
<td><b>File source</b></td>
511511
<td>
512512
<code>path</code>: path to the input directory, and common to all file formats.
513-
<br/><br/>
513+
<br/>
514+
<code>maxFilesPerTrigger</code>: maximum number of new files to be considered in every trigger (default: no max)
515+
<br/>
516+
<code>latestFirst</code>: whether to processs the latest new files first, useful when there is a large backlog of files (default: false)
517+
<br/>
518+
<code>fileNameOnly</code>: whether to check new files based on only the filename instead of on the full path (default: false). With this set to `true`, the following files would be considered as the same file, because their filenames, "dataset.txt", are the same:
519+
<br/>
520+
· "file:///dataset.txt"<br/>
521+
· "s3://a/dataset.txt"<br/>
522+
· "s3n://a/b/dataset.txt"<br/>
523+
· "s3a://a/b/c/dataset.txt"<br/>
524+
<br/>
525+
526+
<br/>
514527
For file-format-specific options, see the related methods in <code>DataStreamReader</code>
515528
(<a href="api/scala/index.html#org.apache.spark.sql.streaming.DataStreamReader">Scala</a>/<a href="api/java/org/apache/spark/sql/streaming/DataStreamReader.html">Java</a>/<a href="api/python/pyspark.sql.html#pyspark.sql.streaming.DataStreamReader">Python</a>/<a
516529
href="api/R/read.stream.html">R</a>).
@@ -1234,18 +1247,7 @@ Here are the details of all the sinks in Spark.
12341247
<td>Append</td>
12351248
<td>
12361249
<code>path</code>: path to the output directory, must be specified.
1237-
<br/>
1238-
<code>maxFilesPerTrigger</code>: maximum number of new files to be considered in every trigger (default: no max)
1239-
<br/>
1240-
<code>latestFirst</code>: whether to processs the latest new files first, useful when there is a large backlog of files (default: false)
1241-
<br/>
1242-
<code>fileNameOnly</code>: whether to check new files based on only the filename instead of on the full path (default: false). With this set to `true`, the following files would be considered as the same file, because their filenames, "dataset.txt", are the same:
1243-
<br/>
1244-
· "file:///dataset.txt"<br/>
1245-
· "s3://a/dataset.txt"<br/>
1246-
· "s3n://a/b/dataset.txt"<br/>
1247-
· "s3a://a/b/c/dataset.txt"<br/>
1248-
<br/>
1250+
<br/><br/>
12491251
For file-format-specific options, see the related methods in DataFrameWriter
12501252
(<a href="api/scala/index.html#org.apache.spark.sql.DataFrameWriter">Scala</a>/<a href="api/java/org/apache/spark/sql/DataFrameWriter.html">Java</a>/<a href="api/python/pyspark.sql.html#pyspark.sql.DataFrameWriter">Python</a>/<a
12511253
href="api/R/write.stream.html">R</a>).

0 commit comments

Comments
 (0)