-
Notifications
You must be signed in to change notification settings - Fork 65
feat: Perform dataset mixing via sampling probabilities in data config #408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Perform dataset mixing via sampling probabilities in data config #408
Conversation
|
Thanks for making a pull request! 😃 |
ca3df7c to
2e64b40
Compare
|
@ashokponkumar @willmj @Abhishek-TAMU the data mixing feature is ready for review now. |
|
@dushyantbehl How do we plan to support validation split and training split? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dushyantbehl, for the PR. The code to add the feature of sampling from multiple datasets looks good to me. The unit tests also look good. Just minor suggestions.
|
Made an image for testing: https://v3.travis.ibm.com/github/ai-foundation/sft-trainer-image/builds/31001438 |
config Signed-off-by: Dushyant Behl <[email protected]>
2e64b40 to
177425f
Compare
@ashokponkumar as discussed offline this feature will be left to subsequent patches we apply so current patch is independent of this feature thanks. |
willmj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks Dushyant
Abhishek-TAMU
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me!
Description of the change
Related issue number
How to verify the PR
Was the PR tested