You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-26525][SHUFFLE] Fast release ShuffleBlockFetcherIterator on completion of the iteration
## What changes were proposed in this pull request?
Currently, spark would not release ShuffleBlockFetcherIterator until the whole task finished.In some conditions, it incurs memory leak.
An example is `rdd.repartition(m).coalesce(n, shuffle = false).save`, each `ShuffleBlockFetcherIterator` contains some metas about mapStatus(`blocksByAddress`) and each resultTask will keep n(max to shuffle partitions) shuffleBlockFetcherIterator and the memory would never released until the task completion, for they are referenced by the completion callbacks of TaskContext. In some case, it may take huge memory and incurs OOM.
Actually, We can release ShuffleBlockFetcherIterator as soon as it's consumed.
This PR is to resolve this problem.
## How was this patch tested?
unittest
Please review http://spark.apache.org/contributing.html before opening a pull request.
Closesapache#23438 from liupc/Fast-release-shuffleblockfetcheriterator.
Lead-authored-by: Liupengcheng <[email protected]>
Co-authored-by: liupengcheng <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
0 commit comments