Skip to content

Commit b21b806

Browse files
zhengruifengsrowen
authored andcommitted
[SPARK-22075][ML] GBTs unpersist datasets cached by Checkpointer
## What changes were proposed in this pull request? `PeriodicRDDCheckpointer` will automatically persist the last 3 datasets called by `PeriodicRDDCheckpointer.update()`. In GBTs, the last 3 intermediate rdds are still cached after `fit()` ## How was this patch tested? existing tests and local test in spark-shell Author: Zheng RuiFeng <[email protected]> Closes apache#19288 from zhengruifeng/gbt_unpersist.
1 parent 9cac249 commit b21b806

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

mllib/src/main/scala/org/apache/spark/ml/tree/impl/GradientBoostedTrees.scala

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -360,7 +360,9 @@ private[spark] object GradientBoostedTrees extends Logging {
360360
logInfo("Internal timing for DecisionTree:")
361361
logInfo(s"$timer")
362362

363+
predErrorCheckpointer.unpersistDataSet()
363364
predErrorCheckpointer.deleteAllCheckpoints()
365+
validatePredErrorCheckpointer.unpersistDataSet()
364366
validatePredErrorCheckpointer.deleteAllCheckpoints()
365367
if (persistedInput) input.unpersist()
366368

0 commit comments

Comments
 (0)