-
Notifications
You must be signed in to change notification settings - Fork 65
feat: Expose additional data handlers as an argument in train #409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Expose additional data handlers as an argument in train #409
Conversation
|
Thanks for making a pull request! 😃 |
ab68156 to
d06f864
Compare
Signed-off-by: Dushyant Behl <[email protected]>
056aede to
64bf80a
Compare
|
@Abhishek-TAMU @willmj @ashokponkumar This PR is ready for review now and has unit tests for the additional data handler feature integrated. |
64bf80a to
abad545
Compare
ashokponkumar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few nits
ashokponkumar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requested few nits
3307565 to
90d9ce8
Compare
willmj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but before merging let's add some docstrings to test cases to explain what they are testing
| {"thisisfine": "thisisnot"}, | ||
| ], | ||
| ) | ||
| def test_run_with_bad_additional_data_handlers(additional_handlers): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: docstrings for all added test cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments. Thanks @willmj
Signed-off-by: Dushyant Behl <[email protected]>
6310c5d
90d9ce8 to
6310c5d
Compare
Abhishek-TAMU
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, other than just few changes!
tuning/data/data_processors.py
Outdated
| if not isinstance(name, str) or not callable(func): | ||
| raise ValueError("Handlers should be of type Dict, str to callable") | ||
| if name in self.registered_handlers: | ||
| logging.warning("Handler name %s existed is being overwritten", name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably this for better readability:
logging.warning("Handler name '%s' already exists and will be overwritten", name)
| train_args, | ||
| PEFT_PT_ARGS, | ||
| additional_data_handlers=None, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just add this line after this:
_validate_training(tempdir)
| train_args, | ||
| PEFT_PT_ARGS, | ||
| additional_data_handlers={TEST_HANDLER: test_handler}, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here also, we can just add this line after this:
_validate_training(tempdir)
Signed-off-by: Abhishek <[email protected]>
|
@willmj @dushyantbehl I just pushed the PR changes suggested by myself and some doc string changes suggested by Will. Looks good to merge now. |
Description of the change
Add an argument in train to take in any user specified data handlers to be registered with the data preprocessor which can be invoked by a custom data config for advanced users.
Related issue number
How to verify the PR
Was the PR tested