Skip to content

Conversation

@qomhmd
Copy link

@qomhmd qomhmd commented Jan 4, 2025

In the original package, ROC-AUC score is calculated using y_preds instead of y_scores. This pull request tries to resolve the issue for classifiers.

@qomhmd qomhmd mentioned this pull request Jan 5, 2025
@shankarpandala
Copy link
Owner

Review Status: Needs Investigation

This PR claims to fix ROC-AUC calculation by using y_score instead of y_pred.

⚠️ Questions:

  1. Can you provide evidence that the current implementation is incorrect?
  2. What specific issue does this fix? (test case, example, or reproducible error)
  3. The PR has +110 -75 changes across 5 files - what else was modified beyond ROC-AUC?

��� Next Steps:

@qomhmd Please provide:

  1. Minimal reproducible example showing the bug
  2. Explanation of what other changes are in this PR
  3. Unit tests demonstrating the fix
  4. Rebase on current dev branch (PR is 10 months old)

Current ROC-AUC Implementation:

The current code uses roc_auc_score which accepts:

  • Binary classification: predictions or probabilities
  • Multi-class: requires predict_proba with specific parameters

Need to verify if this is actually a bug or if the current implementation is correct for the use case.

Decision: On Hold pending author clarification

@shankarpandala
Copy link
Owner

Closing as this issue has already been fixed in the current codebase.

Current Implementation (lines 585-610 in Supervised.py)

The ROC-AUC calculation already uses probabilities, not predictions:

# Use predict_proba for ROC-AUC calculation instead of class labels
if hasattr(pipe, "predict_proba"):
    y_pred_proba = pipe.predict_proba(X_test)
    # For binary classification, use probabilities of positive class
    if y_pred_proba.shape[1] == 2:
        roc_auc = roc_auc_score(y_test, y_pred_proba[:, 1])
    else:
        # For multiclass, use one-vs-rest with probabilities
        roc_auc = roc_auc_score(y_test, y_pred_proba, multi_class='ovr', average='weighted')
elif hasattr(pipe, "decision_function"):
    # For models without predict_proba but with decision_function
    y_pred_score = pipe.decision_function(X_test)
    roc_auc = roc_auc_score(y_test, y_pred_score)
else:
    # Fallback to class labels if neither method is available
    roc_auc = roc_auc_score(y_test, y_pred)

This handles:

  • ✅ Binary classification with probabilities
  • ✅ Multiclass with OVR strategy
  • ✅ Models with decision_function
  • ✅ Fallback for models without probability estimates

Conclusion

The fix you proposed has already been implemented (or was implemented independently). The current code correctly uses y_pred_proba instead of y_pred for ROC-AUC calculation.

Thank you for identifying this issue - it's already resolved in the current version!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants