-
Notifications
You must be signed in to change notification settings - Fork 19.6k
feat: Add a predict_proba method on SKLearnClassifier #21556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 9 commits
0157175
78b9370
eef00dc
1c8f4f8
c47e956
b5f809e
6540334
359e5d0
437eb7f
4bb5936
8dc1345
5cd54c6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,6 +10,7 @@ | |
from keras.src.wrappers.fixes import type_of_target | ||
from keras.src.wrappers.utils import TargetReshaper | ||
from keras.src.wrappers.utils import _check_model | ||
from keras.src.wrappers.utils import _estimator_has_proba | ||
from keras.src.wrappers.utils import assert_sklearn_installed | ||
|
||
try: | ||
|
@@ -18,6 +19,7 @@ | |
from sklearn.base import ClassifierMixin | ||
from sklearn.base import RegressorMixin | ||
from sklearn.base import TransformerMixin | ||
from sklearn.utils.metaestimators import available_if | ||
except ImportError: | ||
sklearn = None | ||
|
||
|
@@ -278,6 +280,15 @@ def dynamic_model(X, y, loss, layers=[10]): | |
``` | ||
""" | ||
|
||
@available_if(_estimator_has_proba) | ||
def predict_proba(self, X): | ||
"""Predict class probabilities of the input samples X.""" | ||
divakaivan marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
from sklearn.utils.validation import check_is_fitted | ||
|
||
check_is_fitted(self) | ||
X = _validate_data(self, X, reset=False) | ||
return self.model_.predict(X) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are these probabilities? Or is this more of a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Reproducible example in google colabfrom tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
import random
import numpy as np
import tensorflow as tf
random.seed(42)
np.random.seed(42)
X, y = make_classification(n_samples=1000, n_features=10, n_informative=4, n_classes=4, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
inp = Input(shape=(10,))
x = Dense(20, activation="relu")(inp)
x = Dense(20, activation="relu")(x)
x = Dense(20, activation="relu")(x)
logits_output = Dense(4, activation=None)(x)
model_logits = Model(inp, logits_output)
model_logits.compile(loss=SparseCategoricalCrossentropy(from_logits=True), optimizer="adam")
model_logits.fit(X_train, y_train, epochs=10, verbose=0)
softmax_output = tf.keras.layers.Activation('softmax')(logits_output)
model_softmax = Model(inp, softmax_output)
test_sample = X_test[:1]
print("LOGITS OUTPUT:")
pred_logits = model_logits.predict(test_sample, verbose=0)
print(pred_logits)
print("SOFTMAX MODEL OUTPUT:")
pred_softmax = model_softmax.predict(test_sample, verbose=0)
print(pred_softmax)
print("MANUAL SOFTMAX APPLIED TO LOGITS:")
pred_manual_softmax = tf.nn.softmax(pred_logits).numpy()
print(pred_manual_softmax)
print("DIFFERENCE")
print(np.abs(pred_softmax - pred_manual_softmax))
|
||
|
||
def _process_target(self, y, reset=False): | ||
"""Classifiers do OHE.""" | ||
target_type = type_of_target(y, raise_unknown=True) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that since the model is configurable, we have no way to know whether the model outputs probabilities or not. This method serves no additional purpose over just
predict()
.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fchollet Could you elaborate, please? I'm not sure I understand your comment. In the case when the user expects probas they will get probas. The only difference between this predict_proba and predict is that the target is not transformed back.
If the user expects probabilities, then they will get them. Although predict_proba might not always return proper probabilities, its inclusion allows users to interoperate with sklearn workflows that expect it. Some examples are in the original issue request.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a chat with Adrin Jalali and @ glemaitre and he suggested we put
predict_proba
under anavailable_if
decorator. Maybe something likeAlso pinging @adrinjalali for his thoughts on the issue/PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you basically need something of the same nature as here:
https://github.com/scikit-learn/scikit-learn/blob/d5c3201291e73e6f3dd6847e3f80557370e8f24c/sklearn/model_selection/_search.py#L602-L603