-
Notifications
You must be signed in to change notification settings - Fork 86
Open
Description
Summary
Under the same model and input, predictions from LightGBM's Python predict can differ from those produced by leaves. Root causes were specification mismatches and missing features in leaves:
- RF averaging mismatch: LightGBM averages by the total number of trees when
average_outputis set, while leaves averaged by the number of trees actually used. - Post-transform coverage: Logistic/softmax/exponential post-transform was not applied for RF models and for JSON-loaded models, even when requested.
- Transform name gap:
Exponentialname was missing fromTransformType.Name()which could lead tounknown.
Expected behavior
When Python's predict(X, raw_score=..., num_iteration=...) has a logical counterpart in leaves using
LGEnsembleFromFile(path, useTransformation) / LGEnsembleFromJSON(r, useTransformation) and Predict...(nEstimators), the numbers should match.
Actual behavior
- With RF models, using only a subset of trees (e.g., via
num_iteration) caused a mismatch because of different averaging rules. - Even with
loadTransformation=true, RF and JSON paths did not apply logistic/softmax/exp transforms, so probabilities did not match Python.
Reproduction (example)
- Python
clf = lgb.Booster(model_file='model.txt')
py = clf.predict(X, raw_score=False, num_iteration=k)
- Go (before)
model, _ := leaves.LGEnsembleFromFile("model.txt", true)
_ = model.PredictDense(vals, rows, cols, pred, k, 0)
→ RF probabilities differ because the older implementation divided by used trees, not total trees.
For JSON models, LGEnsembleFromJSON did not apply the transform and returned raw scores.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels