-
Notifications
You must be signed in to change notification settings - Fork 19
fix: Using PaddingFree plugin with Iterable datasets and remove_columns #135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@fabianlim opening up the PR for simple review for now..I will be doing more testing and sharing the error details we noticed before and after this patch. |
|
related - huggingface/transformers#34830 @dushyantbehl Just in case if you wish to fix this from transformers end there is some interest from HF team for a PR. If you wish to work on this, you can take this up as you see it fit. |
|
Thanks @kmehant surely I'll check it out and fix |
...attention-and-distributed-packing/src/fms_acceleration_aadp/framework_plugin_padding_free.py
Show resolved
Hide resolved
...attention-and-distributed-packing/src/fms_acceleration_aadp/framework_plugin_padding_free.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Dushyant Behl <[email protected]>
|
related - #87 |
|
From @fabianlim's comment we should ideally disable or gracefully fail when multipack is requested when allowing for streaming + padding free. May be this is something to be added to fms-hf-tuning. @dushyantbehl can you check other concerns from Fabian from this issue and see if we have covered - #87 (comment)? |
|
If I read @fabianlim 's comment on #87 (comment) It basically comes down to not supporting arbitrary collators inside On the use of |
|
@fabianlim FYA ^ thank you. |
|
@dushyantbehl can you rebase and fix the conflicts? |
|
@kmehant I would let you or @romitjain pick this up and we can close this PR. |
|
@kmehant the change was minimal so I just rebased the PR lets go ahead with this one itself so its less work for you both |
While doing our testing of using Iterable Datasets in few training runs we hit this error where the collator being used at training time was RemoveColumnsCollator. It seems that at times trl internally wraps the collator we pass either
DataCollatorForSeq2SeqorDataCollatorForCompletionOnlyLMinside thisRemoveColumnsCollatorto remove columns at batch creation time.The simple fix for making this collator work with the padding free plugin is to unwrap the actual collator and try to build a replacement for the same.