Skip to content

Commit bbbf814

Browse files
bomengBo Meng
authored andcommitted
[SPARK-22357][CORE] SparkContext.binaryFiles ignore minPartitions parameter
## What changes were proposed in this pull request? Fix the issue that minPartitions was not used in the method. This is a simple fix and I am not trying to make it complicated. The purpose is to still allow user to control the defaultParallelism through the value of minPartitions, while also via sc.defaultParallelism parameters. ## How was this patch tested? I have not provided the additional test since the fix is very straightforward. Closes apache#21638 from bomeng/22357. Lead-authored-by: Bo Meng <[email protected]> Co-authored-by: Bo Meng <[email protected]> Signed-off-by: Sean Owen <[email protected]>
1 parent 1038540 commit bbbf814

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

core/src/main/scala/org/apache/spark/input/PortableDataStream.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ private[spark] abstract class StreamFileInputFormat[T]
4747
def setMinPartitions(sc: SparkContext, context: JobContext, minPartitions: Int) {
4848
val defaultMaxSplitBytes = sc.getConf.get(config.FILES_MAX_PARTITION_BYTES)
4949
val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES)
50-
val defaultParallelism = sc.defaultParallelism
50+
val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions)
5151
val files = listStatus(context).asScala
5252
val totalBytes = files.filterNot(_.isDirectory).map(_.getLen + openCostInBytes).sum
5353
val bytesPerCore = totalBytes / defaultParallelism

0 commit comments

Comments
 (0)