Skip to content

Conversation

@dushyantbehl
Copy link
Collaborator

@dushyantbehl dushyantbehl commented Dec 7, 2024

Description of the change

Related issue number

How to verify the PR

Was the PR tested

  • I have added >=1 unit test(s) for every new method I have added.
  • I have ensured all unit tests pass

@github-actions
Copy link

github-actions bot commented Dec 7, 2024

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

@github-actions github-actions bot added the feat label Dec 7, 2024
@dushyantbehl dushyantbehl force-pushed the interleave_datasets branch 6 times, most recently from ca3df7c to 2e64b40 Compare December 8, 2024 08:47
@dushyantbehl dushyantbehl marked this pull request as ready for review December 8, 2024 08:47
@dushyantbehl
Copy link
Collaborator Author

@ashokponkumar @willmj @Abhishek-TAMU the data mixing feature is ready for review now.

@ashokponkumar
Copy link
Collaborator

@dushyantbehl How do we plan to support validation split and training split?

Copy link
Collaborator

@Abhishek-TAMU Abhishek-TAMU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dushyantbehl, for the PR. The code to add the feature of sampling from multiple datasets looks good to me. The unit tests also look good. Just minor suggestions.

@willmj
Copy link
Collaborator

willmj commented Dec 10, 2024

Made an image for testing: https://v3.travis.ibm.com/github/ai-foundation/sft-trainer-image/builds/31001438

@dushyantbehl
Copy link
Collaborator Author

@dushyantbehl How do we plan to support validation split and training split?

@ashokponkumar as discussed offline this feature will be left to subsequent patches we apply so current patch is independent of this feature thanks.

Copy link
Collaborator

@willmj willmj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks Dushyant

Copy link
Collaborator

@Abhishek-TAMU Abhishek-TAMU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me!

@Abhishek-TAMU Abhishek-TAMU merged commit 4168c87 into foundation-model-stack:main Dec 10, 2024
8 checks passed
@dushyantbehl dushyantbehl deleted the interleave_datasets branch December 20, 2024 05:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants