-
Notifications
You must be signed in to change notification settings - Fork 4k
[python-package] scikit-learn fit() methods: add eval_X, eval_y, deprecate eval_set #6857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
197a3bd to
5dd3171
Compare
jameslamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It's looking like you're struggling to get this passing CI, so I'm going to put into "draft" for now. @ me any time here if you need help with development, and we can open this back up for review once CI is passing.
I saw you had multiple commits responding to linting errors... here's how to run those locally for faster feedback:
# (or conda, whatever you want)
pip install pre-commit
pre-commit run --all-filesAnd here's how to build locally and run the tests:
# step 1: compile lib_ligihgbtm
# (only need to do this once, because you're not making any C/C++ changes)
cmake -B build -S .
cmake --build build --target _lightgbm -j4
# step 2: install the Python package, re-using it
# (do this every time you change Python code in the library)
sh build-python.sh install --precompile
# step 3: run the scikit-learn tests
pytest tests/python_package_test/test_sklearn.py|
@jameslamb Thanks for your suggestions. |
Making both options available for a time and raising a deprecation warning when I'm sorry but I cannot invest much time in this right now (for example, looking into whether this would introduce inconsistencies with
No, please. As I said in scikit-learn/scikit-learn#28901 (comment), removing |
|
@jameslamb I'm sorry, I really need a maintainer's help. The tests in Details |
|
@jameslamb Thank you so much. Pinning |
|
The remaining CI failures seem unrelated. |
@lorentzenchr if you are interested in continuing this I'd be happy to help with reviews. I'm supportive of adding this, for better compatibility with newer versions of |
|
@jameslamb Yes, I‘d like to finish this. Your review would be great. Anything from me you need before you can start reviewing? |
|
Great! I'd been waiting to review until you were done adding whatever tests you wanted. If you'd like a review before then, update this to latest |
| np.testing.assert_allclose(gbm1.predict(X), gbm2.predict(X)) | ||
| assert gbm1.evals_result_["valid_0"]["l2"][0] == pytest.approx(gbm2.evals_result_["valid_0"]["l2"][0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(changed by me in d7e0fff)
Added this check on the validation results. Just checking the predicted values is not sufficient to test that the passed validation sets were actually used.
| X_test1, X_test2 = X_test[: n // 2], X_test[n // 2 :] | ||
| y_test1, y_test2 = y_test[: n // 2], y_test[n // 2 :] | ||
| gbm1 = lgb.LGBMRegressor(**params) | ||
| with pytest.warns(LGBMDeprecationWarning, match="The argument 'eval_set' is deprecated.*"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(changed by me in d7e0fff)
Added these with pytest.warns() to suppress these warnings in test logs.
It's still valuable, I think, to have the standalone test_eval_set_deprecation test.
| assert set(gbm2.evals_result_.keys()) == {"valid_0", "valid_1"}, ( | ||
| f"expected 2 validation sets in evals_result_, got {gbm2.evals_result_.keys()}" | ||
| ) | ||
| assert gbm1.evals_result_["valid_0"]["l2"][0] == pytest.approx(gbm2.evals_result_["valid_0"]["l2"][0]) | ||
| assert gbm1.evals_result_["valid_1"]["l2"][0] == pytest.approx(gbm2.evals_result_["valid_1"]["l2"][0]) | ||
| assert gbm2.evals_result_["valid_0"]["l2"] != gbm2.evals_result_["valid_1"]["l2"], ( | ||
| "Evaluation results for the 2 validation sets are not different. This might mean they weren't both used." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(changed by me in d7e0fff)
Expanded these tests so they could catch more possible bugs, like:
- one of the validation sets was ignored
- the same validation set was referenced multiple times (instead of them being considered separately)
| gbm.fit(X_train, y_train, eval_X=(X_test,) * 3, eval_y=(y_test,) * 2) | ||
|
|
||
|
|
||
| def test_ranker_eval_set_raises(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(changed by me in d7e0fff)
This test cases checks all of the validation of eval_group that I shuffled around (see https://github.com/microsoft/LightGBM/pull/6857/files#r2650232387)
|
Ok done adding inline comment threads. One other note... as you probably expect, the |
|
I've updated this to latest |
StrikerRUS
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lorentzenchr Thank you so much for your hard work here! Generally LGTM! But please consider checking some my minor comments below.
python-package/lightgbm/sklearn.py
Outdated
| eval_set : list or None, optional (default=None) (deprecated) | ||
| A list of (X, y) tuple pairs to use as validation sets. | ||
| This is deprecated, use `eval_X` and `eval_y` instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that deprecated directive will suit better.
https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html#directive-deprecated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this in feafe22
And looks like it's now .. version-deprecated:: not .. deprecated::
Changed in version 9.0: The deprecated directive was renamed to version-deprecated. The previous name is retained as an alias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait re-reading that.... Sphinx 9.0 is very new (November 2025).
I just pushed 2924d68 switching back to .. deprecated.
I'll test using Sphinx 9.x in a separate PR, on a LightGBM branch where we can test how it's rendered on readthedocs.
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
Co-authored-by: James Lamb <jaylamb20@gmail.com>
|
Ok @StrikerRUS I think I've addressed all comments, could you take another look and merge this if you approve? |
|
@jameslamb @StrikerRUS Thanks for your support on this PR. |
As discussed in scikit-learn/scikit-learn#28901 (comment), this PR adds
eval_Xandeval_yin order to make LGBM estimators compatible with scikit-learn's (as of version 1.6)Pipeline(..., transform_input=["eval_X"]).See also scikit-learn/scikit-learn#27124.