No split found when target/output of RandomForestRegressor is very low.

Hello! I'm running into an issue when I use the `RandomForestRegressor` on a dataset with very low (in the range of 1e-3, 1e-4 and lower) values as outputs (`Y`). When the values fall into that range, the `DecisionTree`s are made up entirely of a single layer of `leaf`, which predict always a constant value, the mean of the desired output. Curiously, simply multiplying the output vector for `100` or `1000` seems to solve the problem, and valid splits can then be found.

I'll try to provide a shareable MWE in the next few days. It might take a while, because the feature set is large and the data is protected, and I will have to do some masking. However, going through the source code, I'm wondering if the cuplrit is not

https://github.com/JuliaAI/DecisionTree.jl/blob/9dab9c12fcf2d54d4591b23fc87512964fb664b8/src/regression/tree.jl#L90

because the cases where the issue happens coincide with the true values of `sum(y) * mean(y) > -1e-7 * length(y) + sum(y .^ 2)`. Needless to say, when I multiply everything for a small power of 10, the value of this expression becomes false, and valid splits can then be found.

Now, it is my understanding that, to a certain extent, Decision Trees should be irrespective of any kind of normalization. I expect that valid splits can be found even when the outputs are very, very low. Also, sweeping through Breiman's original C code, - avaiable [here](https://github.com/cran/randomForest/blob/master/src/regTree.c) - I have not found an equivalent test to this specific line I tagged (but that could be my poor C reading skills?).

Can you please clarify the purpose of such line, confirm whether this is intended behavior, and see what can be done to mitigate this issue? (I know, the lack of a MWE makes things harder, however, I just wanted to leave the issue registered for now)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

No split found when target/output of RandomForestRegressor is very low. #191

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

No split found when target/output of RandomForestRegressor is very low. #191

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions