Skip to content

Conversation

@techy4shri
Copy link

@techy4shri techy4shri commented Dec 21, 2025

Closes #1729

  • Refactored SingleColumnTransformer in a separate file
  • Updated imports everywhere accordingly
  • Updated Doctests to remove the documentation example errors

Now the new file has separate transformer classes, the tests all but pass. I refrained from making a new test file since test_on_each_column.py has both SingleColumnTransformer and RejectColumn covered in it.
Let me know if any changes are required.
Note: The current implementation is such that both SingleColumnTransformer and RejectColumn are added as an internal libs rather than being part of the public API. I read the issue and the long discussion about making them both public in this issue discussion but it seems to me that the internal team would handle it later, after the issue I am assigned to will be closed. Just putting it here :)

@rcap107
Copy link
Member

rcap107 commented Dec 22, 2025

Hi @techy4shri, thanks a lot for this PR! It's already in a good shape.

The tests are failing because doctests are checking that the examples in the documentations are passing. Doctests are failing in some places because they are expecting the RejectColumn exception to still be in skrub._apply_to_cols, while now it has been moved to the new file. Those doctests should therefore be updated with the new path to not fail:

    skrub._apply_to_cols.RejectColumn: 

should become

    skrub_single_column_transformer.RejectColumn: 

Also, could you move the tests on the single_column_transformer to their own separate test file?

@techy4shri
Copy link
Author

Understood @rcap107 ! Will do! Thanks for the confirmation.

@techy4shri
Copy link
Author

@rcap107 the PR is ready for review, kindly let me know if any changes are required.

rcap107
rcap107 previously approved these changes Jan 6, 2026
Copy link
Member

@rcap107 rcap107 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks a lot @techy4shri!

@rcap107 rcap107 dismissed their stale review January 6, 2026 15:20

Dismissing to give some more time to other maintainers

@techy4shri
Copy link
Author

techy4shri commented Jan 6, 2026

okk thanks! I shall be waiting for the final judgement 🫡 @jeromedockes do let me know your thoughts too 👀

Copy link
Member

@jeromedockes jeromedockes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for all this work @techy4shri ! 🚀

Traceback (most recent call last):
...
skrub._apply_to_cols.RejectColumn: Could not convert column 's' to numbers.
skrub._single_column_transformer.RejectColumn:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not wrap this line but keep it identical to the actual output (no line break before 'Could not convert' ...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be the formatter's doing, let me fix that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, still getting that when i try to make commits:

E501 Line too long (91 > 88)
   --> skrub\_clean_null_strings.py:165:89
    |
163 |     Traceback (most recent call last):
164 |         ...
165 |     skrub._single_column_transformer.RejectColumn: Could not convert column 's' to numbers.
    |                                                                                         ^^^
166 |     >>> ToFloat().fit_transform(cleaner.fit_transform(s))
167 |     0    1.1
    |

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the pre-commit check fails :)

" estimators, its ``fit``, ``transform`` and ``fit_transform`` methods expect a"
" single column (a pandas or polars Series) rather than a full dataframe. To apply"
" this transformer to one or more columns in a dataframe, use it as a parameter in"
" a ``skrub.ApplyToCols`` or a ``skrub.TableVectorizer``."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rcap107 #1826 will need updating after merging this PR

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoder

from skrub import _dataframe as sbd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not in this PR) this file should be renamed test_apply_to_cols

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did think it was a bit confusing, happy to know this is supposed to be that 😭

@techy4shri
Copy link
Author

Pre-commit check will fail now because of line length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CHORE - Refactor SingleColumnTransformer so it's in its own file

3 participants