-
Notifications
You must be signed in to change notification settings - Fork 187
chore: Refactor SingleColumnTransformer so it's in its own file #1820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ted imports and functions
|
Hi @techy4shri, thanks a lot for this PR! It's already in a good shape. The tests are failing because doctests are checking that the examples in the documentations are passing. Doctests are failing in some places because they are expecting the RejectColumn exception to still be in skrub._apply_to_cols, while now it has been moved to the new file. Those doctests should therefore be updated with the new path to not fail: skrub._apply_to_cols.RejectColumn: should become skrub_single_column_transformer.RejectColumn: Also, could you move the tests on the single_column_transformer to their own separate test file? |
|
Understood @rcap107 ! Will do! Thanks for the confirmation. |
|
@rcap107 the PR is ready for review, kindly let me know if any changes are required. |
rcap107
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks a lot @techy4shri!
Dismissing to give some more time to other maintainers
|
okk thanks! I shall be waiting for the final judgement 🫡 @jeromedockes do let me know your thoughts too 👀 |
jeromedockes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for all this work @techy4shri ! 🚀
skrub/_clean_null_strings.py
Outdated
| Traceback (most recent call last): | ||
| ... | ||
| skrub._apply_to_cols.RejectColumn: Could not convert column 's' to numbers. | ||
| skrub._single_column_transformer.RejectColumn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not wrap this line but keep it identical to the actual output (no line break before 'Could not convert' ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be the formatter's doing, let me fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, still getting that when i try to make commits:
E501 Line too long (91 > 88)
--> skrub\_clean_null_strings.py:165:89
|
163 | Traceback (most recent call last):
164 | ...
165 | skrub._single_column_transformer.RejectColumn: Could not convert column 's' to numbers.
| ^^^
166 | >>> ToFloat().fit_transform(cleaner.fit_transform(s))
167 | 0 1.1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and the pre-commit check fails :)
| " estimators, its ``fit``, ``transform`` and ``fit_transform`` methods expect a" | ||
| " single column (a pandas or polars Series) rather than a full dataframe. To apply" | ||
| " this transformer to one or more columns in a dataframe, use it as a parameter in" | ||
| " a ``skrub.ApplyToCols`` or a ``skrub.TableVectorizer``." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| from sklearn.pipeline import make_pipeline | ||
| from sklearn.preprocessing import OneHotEncoder | ||
|
|
||
| from skrub import _dataframe as sbd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(not in this PR) this file should be renamed test_apply_to_cols
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did think it was a bit confusing, happy to know this is supposed to be that 😭
|
Pre-commit check will fail now because of line length. |
Closes #1729
Now the new file has separate transformer classes, the tests all but pass. I refrained from making a new test file since
test_on_each_column.pyhas bothSingleColumnTransformerandRejectColumncovered in it.Let me know if any changes are required.
Note: The current implementation is such that both SingleColumnTransformer and RejectColumn are added as an internal libs rather than being part of the public API. I read the issue and the long discussion about making them both public in this issue discussion but it seems to me that the internal team would handle it later, after the issue I am assigned to will be closed. Just putting it here :)