Skip to content

Conversation

@dushyantbehl
Copy link
Collaborator

@dushyantbehl dushyantbehl commented Dec 8, 2024

Description of the change

Add an argument in train to take in any user specified data handlers to be registered with the data preprocessor which can be invoked by a custom data config for advanced users.

Related issue number

How to verify the PR

Was the PR tested

  • I have added >=1 unit test(s) for every new method I have added.
  • I have ensured all unit tests pass

@github-actions
Copy link

github-actions bot commented Dec 8, 2024

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

@github-actions github-actions bot added the feat label Dec 8, 2024
@dushyantbehl dushyantbehl force-pushed the additional_data_handlers branch 3 times, most recently from ab68156 to d06f864 Compare December 9, 2024 05:24
@dushyantbehl dushyantbehl force-pushed the additional_data_handlers branch from 056aede to 64bf80a Compare December 12, 2024 15:26
@dushyantbehl dushyantbehl marked this pull request as ready for review December 12, 2024 15:26
@dushyantbehl
Copy link
Collaborator Author

@Abhishek-TAMU @willmj @ashokponkumar This PR is ready for review now and has unit tests for the additional data handler feature integrated.

@dushyantbehl dushyantbehl force-pushed the additional_data_handlers branch from 64bf80a to abad545 Compare December 12, 2024 15:38
Copy link
Collaborator

@ashokponkumar ashokponkumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nits

Copy link
Collaborator

@ashokponkumar ashokponkumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested few nits

@dushyantbehl dushyantbehl force-pushed the additional_data_handlers branch 2 times, most recently from 3307565 to 90d9ce8 Compare December 12, 2024 19:16
ashokponkumar
ashokponkumar previously approved these changes Dec 12, 2024
willmj
willmj previously approved these changes Dec 12, 2024
Copy link
Collaborator

@willmj willmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but before merging let's add some docstrings to test cases to explain what they are testing

{"thisisfine": "thisisnot"},
],
)
def test_run_with_bad_additional_data_handlers(additional_handlers):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: docstrings for all added test cases

Copy link
Collaborator Author

@dushyantbehl dushyantbehl Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments. Thanks @willmj

@dushyantbehl dushyantbehl dismissed stale reviews from willmj and ashokponkumar via 6310c5d December 12, 2024 20:59
@dushyantbehl dushyantbehl force-pushed the additional_data_handlers branch from 90d9ce8 to 6310c5d Compare December 12, 2024 20:59
Copy link
Collaborator

@Abhishek-TAMU Abhishek-TAMU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, other than just few changes!

if not isinstance(name, str) or not callable(func):
raise ValueError("Handlers should be of type Dict, str to callable")
if name in self.registered_handlers:
logging.warning("Handler name %s existed is being overwritten", name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this for better readability:
logging.warning("Handler name '%s' already exists and will be overwritten", name)

train_args,
PEFT_PT_ARGS,
additional_data_handlers=None,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just add this line after this:

_validate_training(tempdir)

train_args,
PEFT_PT_ARGS,
additional_data_handlers={TEST_HANDLER: test_handler},
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here also, we can just add this line after this:

_validate_training(tempdir)

Signed-off-by: Abhishek <[email protected]>
@Abhishek-TAMU
Copy link
Collaborator

Abhishek-TAMU commented Dec 12, 2024

@willmj @dushyantbehl I just pushed the PR changes suggested by myself and some doc string changes suggested by Will. Looks good to merge now.
I don't have access to merge this PR directly. One of the 5 reviewers can approve it.

@Abhishek-TAMU Abhishek-TAMU enabled auto-merge (squash) December 12, 2024 22:02
@Abhishek-TAMU Abhishek-TAMU merged commit 689ee41 into foundation-model-stack:main Dec 13, 2024
8 checks passed
@dushyantbehl dushyantbehl deleted the additional_data_handlers branch December 20, 2024 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants