Skip to content

Commit 06c70ba

Browse files
viiryahuaxingao
authored andcommitted
[MINOR][SQL] Move iterator.hasNext into try block in executeTask
### What changes were proposed in this pull request? This patch moves `iterator.hasNext` into the try block of `tryWithSafeFinallyAndFailureCallbacks` in `FileFormatWriter.executeTask`. ### Why are the changes needed? Not only `dataWriter.writeWithIterator(iterator)` causes error, `iterator.hasNext` could cause error like: ``` org.apache.spark.shuffle.FetchFailedException: Block shuffle_1_106_21 is corrupted but checksum verification passed ``` As it is not wrapped in the try block, `abort` won't be called on the committer. But as `setupTask` is called, it is safer to call `abort` in any case that error happens after it. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing test ### Was this patch authored or co-authored using generative AI tooling? No Closes #48360 from viirya/try_block. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: huaxingao <[email protected]>
1 parent 37f2966 commit 06c70ba

File tree

1 file changed

+27
-18
lines changed

1 file changed

+27
-18
lines changed

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala

Lines changed: 27 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -383,32 +383,41 @@ object FileFormatWriter extends Logging {
383383

384384
committer.setupTask(taskAttemptContext)
385385

386-
val dataWriter =
387-
if (sparkPartitionId != 0 && !iterator.hasNext) {
388-
// In case of empty job, leave first partition to save meta for file format like parquet.
389-
new EmptyDirectoryDataWriter(description, taskAttemptContext, committer)
390-
} else if (description.partitionColumns.isEmpty && description.bucketSpec.isEmpty) {
391-
new SingleDirectoryDataWriter(description, taskAttemptContext, committer)
392-
} else {
393-
concurrentOutputWriterSpec match {
394-
case Some(spec) =>
395-
new DynamicPartitionDataConcurrentWriter(
396-
description, taskAttemptContext, committer, spec)
397-
case _ =>
398-
new DynamicPartitionDataSingleWriter(description, taskAttemptContext, committer)
399-
}
400-
}
386+
var dataWriter: FileFormatDataWriter = null
401387

402388
Utils.tryWithSafeFinallyAndFailureCallbacks(block = {
389+
dataWriter =
390+
if (sparkPartitionId != 0 && !iterator.hasNext) {
391+
// In case of empty job, leave first partition to save meta for file format like parquet.
392+
new EmptyDirectoryDataWriter(description, taskAttemptContext, committer)
393+
} else if (description.partitionColumns.isEmpty && description.bucketSpec.isEmpty) {
394+
new SingleDirectoryDataWriter(description, taskAttemptContext, committer)
395+
} else {
396+
concurrentOutputWriterSpec match {
397+
case Some(spec) =>
398+
new DynamicPartitionDataConcurrentWriter(
399+
description, taskAttemptContext, committer, spec)
400+
case _ =>
401+
new DynamicPartitionDataSingleWriter(description, taskAttemptContext, committer)
402+
}
403+
}
404+
403405
// Execute the task to write rows out and commit the task.
404406
dataWriter.writeWithIterator(iterator)
405407
dataWriter.commit()
406408
})(catchBlock = {
407409
// If there is an error, abort the task
408-
dataWriter.abort()
409-
logError(log"Job ${MDC(JOB_ID, jobId)} aborted.")
410+
if (dataWriter != null) {
411+
dataWriter.abort()
412+
} else {
413+
committer.abortTask(taskAttemptContext)
414+
}
415+
logError(log"Job: ${MDC(JOB_ID, jobId)}, Task: ${MDC(TASK_ID, taskId)}, " +
416+
log"Task attempt ${MDC(TASK_ATTEMPT_ID, taskAttemptId)} aborted.")
410417
}, finallyBlock = {
411-
dataWriter.close()
418+
if (dataWriter != null) {
419+
dataWriter.close()
420+
}
412421
})
413422
}
414423

0 commit comments

Comments
 (0)