Skip to content

Commit 1a915bf

Browse files
gaborgsomogyisrowen
authored andcommitted
[MINOR][SQL][DOCS] failOnDataLoss has effect on batch queries so fix the doc
## What changes were proposed in this pull request? According to the [Kafka integration document](https://spark.apache.org/docs/2.4.0/structured-streaming-kafka-integration.html) `failOnDataLoss` has effect only on streaming queries. While I was implementing the DSv2 Kafka batch sources I've realized it's not true. This feature is covered in [KafkaDontFailOnDataLossSuite](https://github.com/apache/spark/blob/54da3bbfb2c936827897c52ed6e5f0f428b98e9f/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala#L180). In this PR I've updated the doc to reflect this behavior. ## How was this patch tested? ``` cd docs/ SKIP_API=1 jekyll build ``` Manual webpage check. Closes apache#24932 from gaborgsomogyi/failOnDataLoss. Authored-by: Gabor Somogyi <[email protected]> Signed-off-by: Sean Owen <[email protected]>
1 parent 5a7aa6f commit 1a915bf

File tree

1 file changed

+2
-3
lines changed

1 file changed

+2
-3
lines changed

docs/structured-streaming-kafka-integration.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -355,11 +355,10 @@ The following configurations are optional:
355355
<td>failOnDataLoss</td>
356356
<td>true or false</td>
357357
<td>true</td>
358-
<td>streaming query</td>
358+
<td>streaming and batch</td>
359359
<td>Whether to fail the query when it's possible that data is lost (e.g., topics are deleted, or
360360
offsets are out of range). This may be a false alarm. You can disable it when it doesn't work
361-
as you expected. Batch queries will always fail if it fails to read any data from the provided
362-
offsets due to lost data.</td>
361+
as you expected.</td>
363362
</tr>
364363
<tr>
365364
<td>kafkaConsumer.pollTimeoutMs</td>

0 commit comments

Comments
 (0)