Skip to content

Commit ca69558

Browse files
Peng Mengsrowen
authored andcommitted
[SPARK-21638][ML] Fix RF/GBT Warning message error
## What changes were proposed in this pull request? When train RF model, there are many warning messages like this: > WARN RandomForest: Tree learning is using approximately 268492800 bytes per iteration, which exceeds requested limit maxMemoryUsage=268435456. This allows splitting 2622 nodes in this iteration. This warning message is unnecessary and the data is not accurate. Actually, if all the nodes cannot split in one iteration, it will show this warning. For most of the case, all the nodes cannot split just in one iteration, so for most of the case, it will show this warning for each iteration. ## How was this patch tested? The existing UT Author: Peng Meng <[email protected]> Closes apache#18868 from mpjlu/fixRFwarning.
1 parent 95ad960 commit ca69558

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1089,7 +1089,8 @@ private[spark] object RandomForest extends Logging {
10891089
var numNodesInGroup = 0
10901090
// If maxMemoryInMB is set very small, we want to still try to split 1 node,
10911091
// so we allow one iteration if memUsage == 0.
1092-
while (nodeStack.nonEmpty && (memUsage < maxMemoryUsage || memUsage == 0)) {
1092+
var groupDone = false
1093+
while (nodeStack.nonEmpty && !groupDone) {
10931094
val (treeIndex, node) = nodeStack.top
10941095
// Choose subset of features for node (if subsampling).
10951096
val featureSubset: Option[Array[Int]] = if (metadata.subsamplingFeatures) {
@@ -1107,9 +1108,11 @@ private[spark] object RandomForest extends Logging {
11071108
mutableTreeToNodeToIndexInfo
11081109
.getOrElseUpdate(treeIndex, new mutable.HashMap[Int, NodeIndexInfo]())(node.id)
11091110
= new NodeIndexInfo(numNodesInGroup, featureSubset)
1111+
numNodesInGroup += 1
1112+
memUsage += nodeMemUsage
1113+
} else {
1114+
groupDone = true
11101115
}
1111-
numNodesInGroup += 1
1112-
memUsage += nodeMemUsage
11131116
}
11141117
if (memUsage > maxMemoryUsage) {
11151118
// If maxMemoryUsage is 0, we should still allow splitting 1 node.

0 commit comments

Comments
 (0)