Skip to content

[python-package] Add decision_function() to LGBMClassifier #7158

@CoderWota

Description

@CoderWota

Summary

I would like to add a decision_function() interface to LightGBM’s scikit-learn API (LGBMClassifier) to expose the model’s raw scores (margin / logit). This would allow scikit-learn’s CalibratedClassifierCV to prefer raw scores as calibration inputs, instead of falling back to probabilities produced by predict_proba().

Motivation

In practical applications, it is common to calibrate the posterior probabilities output by LightGBM classification models (probability calibration), e.g., for risk scoring, threshold-based decision making, and cost-sensitive classification.

scikit-learn’s CalibratedClassifierCV requires a continuous “score” from the base estimator as input to the calibrator. According to the documentation, calibration is based on the estimator’s decision_function() output if it exists; otherwise, it uses predict_proba().

Currently, LGBMClassifier does not implement decision_function(). As a result, when users pass an LGBMClassifier into CalibratedClassifierCV, the calibrator can only use predict_proba() outputs.

However, for LightGBM, predict_proba() returns probabilities obtained after applying a sigmoid (binary classification) or softmax (multiclass classification) transformation to the raw scores (margins). Applying CalibratedClassifierCV(method="sigmoid") on top of these outputs effectively fits another monotonic mapping (a sigmoid calibrator) on already-probabilistic outputs. In practice, this often pushes predicted probabilities further toward the center (i.e., becoming more conservative / “compressed”), which can negatively affect both calibration quality and discriminative power.

Even with method="isotonic", learning an additional monotonic mapping in probability space is generally less clean and natural than calibrating directly in raw-margin space. A more principled approach is to let the calibrator learn a mapping (sigmoid or isotonic) directly from raw scores (margin/logit), producing more reliable probabilities.

Therefore, adding decision_function() to LGBMClassifier to return raw scores would significantly improve compatibility with scikit-learn’s calibration tooling and the quality of calibration.

Description

Add a decision_function(X) method to LGBMClassifier as an alias for raw score outputs.

Proposed behavior:

  • Binary classification: return the raw margin (logit) for each sample
  • Multiclass classification: return raw scores with shape (n_samples, n_classes)

Implementation can directly call LightGBM’s prediction interface with raw_score=True, e.g.:

  • predict(X, raw_score=True) (or an equivalent path), ensuring the output is a raw score rather than a probability.

References

The calibration is based on the decision_function method of the estimator if it exists, else on predict_proba.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions