-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Summary
I would like to add a decision_function() interface to LightGBM’s scikit-learn API (LGBMClassifier) to expose the model’s raw scores (margin / logit). This would allow scikit-learn’s CalibratedClassifierCV to prefer raw scores as calibration inputs, instead of falling back to probabilities produced by predict_proba().
Motivation
In practical applications, it is common to calibrate the posterior probabilities output by LightGBM classification models (probability calibration), e.g., for risk scoring, threshold-based decision making, and cost-sensitive classification.
scikit-learn’s CalibratedClassifierCV requires a continuous “score” from the base estimator as input to the calibrator. According to the documentation, calibration is based on the estimator’s decision_function() output if it exists; otherwise, it uses predict_proba().
Currently, LGBMClassifier does not implement decision_function(). As a result, when users pass an LGBMClassifier into CalibratedClassifierCV, the calibrator can only use predict_proba() outputs.
However, for LightGBM, predict_proba() returns probabilities obtained after applying a sigmoid (binary classification) or softmax (multiclass classification) transformation to the raw scores (margins). Applying CalibratedClassifierCV(method="sigmoid") on top of these outputs effectively fits another monotonic mapping (a sigmoid calibrator) on already-probabilistic outputs. In practice, this often pushes predicted probabilities further toward the center (i.e., becoming more conservative / “compressed”), which can negatively affect both calibration quality and discriminative power.
Even with method="isotonic", learning an additional monotonic mapping in probability space is generally less clean and natural than calibrating directly in raw-margin space. A more principled approach is to let the calibrator learn a mapping (sigmoid or isotonic) directly from raw scores (margin/logit), producing more reliable probabilities.
Therefore, adding decision_function() to LGBMClassifier to return raw scores would significantly improve compatibility with scikit-learn’s calibration tooling and the quality of calibration.
Description
Add a decision_function(X) method to LGBMClassifier as an alias for raw score outputs.
Proposed behavior:
- Binary classification: return the raw margin (logit) for each sample
- Multiclass classification: return raw scores with shape
(n_samples, n_classes)
Implementation can directly call LightGBM’s prediction interface with raw_score=True, e.g.:
predict(X, raw_score=True)(or an equivalent path), ensuring the output is a raw score rather than a probability.
References
- scikit-learn
CalibratedClassifierCVdocumentation (decision_function preference):
https://scikit-learn.org/stable/modules/generated/sklearn.calibration.CalibratedClassifierCV.html
The calibration is based on the decision_function method of the estimator if it exists, else on predict_proba.