Skip to content

Commit 15fff79

Browse files
squitojerryshao
authored andcommitted
[SPARK-24297][CORE] Fetch-to-disk by default for > 2gb
Fetch-to-mem is guaranteed to fail if the message is bigger than 2 GB, so we might as well use fetch-to-disk in that case. The message includes some metadata in addition to the block data itself (in particular UploadBlock has a lot of metadata), so we leave a little room. Author: Imran Rashid <[email protected]> Closes apache#21474 from squito/SPARK-24297.
1 parent 3efdf35 commit 15fff79

File tree

2 files changed

+11
-5
lines changed

2 files changed

+11
-5
lines changed

core/src/main/scala/org/apache/spark/internal/config/package.scala

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -432,7 +432,11 @@ package object config {
432432
"external shuffle service, this feature can only be worked when external shuffle" +
433433
"service is newer than Spark 2.2.")
434434
.bytesConf(ByteUnit.BYTE)
435-
.createWithDefault(Long.MaxValue)
435+
// fetch-to-mem is guaranteed to fail if the message is bigger than 2 GB, so we might
436+
// as well use fetch-to-disk in that case. The message includes some metadata in addition
437+
// to the block data itself (in particular UploadBlock has a lot of metadata), so we leave
438+
// extra room.
439+
.createWithDefault(Int.MaxValue - 512)
436440

437441
private[spark] val TASK_METRICS_TRACK_UPDATED_BLOCK_STATUSES =
438442
ConfigBuilder("spark.taskMetrics.trackUpdatedBlockStatuses")

docs/configuration.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -580,13 +580,15 @@ Apart from these, the following properties are also available, and may be useful
580580
</tr>
581581
<tr>
582582
<td><code>spark.maxRemoteBlockSizeFetchToMem</code></td>
583-
<td>Long.MaxValue</td>
583+
<td>Int.MaxValue - 512</td>
584584
<td>
585585
The remote block will be fetched to disk when size of the block is above this threshold in bytes.
586-
This is to avoid a giant request takes too much memory. We can enable this config by setting
587-
a specific value(e.g. 200m). Note this configuration will affect both shuffle fetch
586+
This is to avoid a giant request that takes too much memory. By default, this is only enabled
587+
for blocks > 2GB, as those cannot be fetched directly into memory, no matter what resources are
588+
available. But it can be turned down to a much lower value (eg. 200m) to avoid using too much
589+
memory on smaller blocks as well. Note this configuration will affect both shuffle fetch
588590
and block manager remote block fetch. For users who enabled external shuffle service,
589-
this feature can only be worked when external shuffle service is newer than Spark 2.2.
591+
this feature can only be used when external shuffle service is newer than Spark 2.2.
590592
</td>
591593
</tr>
592594
<tr>

0 commit comments

Comments
 (0)