- 
                Notifications
    You must be signed in to change notification settings 
- Fork 38
TST: Add sklearn <-> skglm match tests for Poisson and Gamma predictions #323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: Add sklearn <-> skglm match tests for Poisson and Gamma predictions #323
Conversation
        
          
                skglm/estimators.py
              
                Outdated
          
        
      | indices = scores.argmax(axis=1) | ||
| return self.classes_[indices] | ||
| elif isinstance(self.datafit, (Poisson, PoissonGroup)): | ||
| elif isinstance(self.datafit, (Poisson, PoissonGroup, Gamma)): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if __hasattr__(self.datafit, "inverse_link_function"):
     return self.datafit.inverse_link_function(self._decision_function(X))
        
          
                skglm/tests/test_estimators.py
              
                Outdated
          
        
      | assert isinstance(res, str) | ||
|  | ||
|  | ||
| def test_poisson_predictions_match_sklearn(): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge in a single parametrized test test_inverse_link_prediction
| @Badr-MOUFAD @mathurinm Ready for review! | 
| @Badr-MOUFAD this is the last one we need to release 0.5, WDYT ? | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work guys 🙏
I have two comment
1 . What do you think about make inverse_link the identity by default to have a consistent API
 @staticmethod
 def inverse_link(x):
         return x- The inverse_link for Logisticis missing, is there are reason for not implementing it ?
| @Badr-MOUFAD Regarding Question 2: | 
| Agree with @floriankozikowski, the prediction logic for classification is different | 
        
          
                skglm/datafits/base.py
              
                Outdated
          
        
      | """Base class for datafits.""" | ||
|  | ||
| @staticmethod | ||
| def inverse_link(x): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just call the argument Xw for clarity
        
          
                skglm/datafits/group.py
              
                Outdated
          
        
      | self.grp_ptr, self.grp_indices = grp_ptr, grp_indices | ||
|  | ||
| @staticmethod | ||
| def inverse_link(x): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
        
          
                skglm/datafits/single_task.py
              
                Outdated
          
        
      | pass | ||
|  | ||
| @staticmethod | ||
| def inverse_link(x): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
        
          
                skglm/datafits/single_task.py
              
                Outdated
          
        
      | pass | ||
|  | ||
| @staticmethod | ||
| def inverse_link(x): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
| ) | ||
| def test_inverse_link_prediction(sklearn_reg, skglm_datafit, y_gen): | ||
| np.random.seed(42) | ||
| X = np.random.randn(20, 5) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one last thing : IMO it makes sense to run the test on completely random values of y. They don't have to fit the model well, thay could be random integers between 0 and 5. We're notchecking statistical validity, we're checking that the optimizer works well and we return the same thing as sklearn. This would make the test simpler.
| Thanks @floriankozikowski ! | 
Follows up on #321 : Add unit tests verifying sklearn prediction compatibility
This PR addresses the request from @mathurinm to add unit tests ensuring that skglm's Poisson and Gamma estimators produce the same predictions as sklearn on simple data.
These tests validate that the prediction fix from #321 (applying
exp()transform for log-link datafits) correctly matches sklearn's behavior.