Deepspeed Ulysses/ALST integration#3817
Conversation
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
SunMarc
left a comment
There was a problem hiding this comment.
Thanks a lot, this is already looking quite nice ! Left some minor comments. Please ping me when you have finished the integration !
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
Ulysses/ALST integration with HF Accelerate: - Allow `UlyssesSPAttentionHF.register_with_transformers` to get a `model` obj as an argument, to match HF accelerate's workflow - Fix existing Ulysses' tests to tests z2 instead of z1 - Improve documentation - Add a defensive check The HF Accelerate PR that depends on this PR is here huggingface/accelerate#3817 --------- Signed-off-by: Stas Bekman <stas@stason.org>
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
So we just have a few conversations above to complete and otherwise I'm just waiting for the deepspeed to make a new version so that we could anchor on it here. Otherwise it's ready for your complete review - but don't merge just yet until we get the new ds version here. And then we can discuss the HF Trainer integration. Should we somehow mark this API as experimental to let users use it for a bit and possibly adjust things? If so please give me an example to follow. |
There are some e2e examples in the example/torch_native_parallelism folder but we are not running them in the CI.
Let's try to integrate it into HF Trainer before merging this PR. Once it is tightly coupled to Trainer, even if the API is marked as experimental, we will most likely try to limit breaking changes. For experimental features, we just put it on the docs, like for big model inference (probably need to remove the warning for this feature) |
SunMarc
left a comment
There was a problem hiding this comment.
Thanks for adding the docs and the tests, this looks really nice. Just some minor nits
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
|
Taking this out of code comments so that it doesn't disappear with 'Resolve conversation'
Those features belong to
|
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
… into alst-integration
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
src/accelerate/test_utils/scripts/external_deps/test_ds_alst_ulysses_sp.py
Outdated
Show resolved
Hide resolved
|
Can we merge this PR @stas00 ? |
|
Thank you for further improvements/fixes, Kashif Yes, let's merge it. Thank you. |
|
Merging this PR seems to have invalidated the original cp implement. accelerate/src/accelerate/accelerator.py Line 1639 in d1c96ba However, parallelism_config.sp_backend can only be set to "deepspeed" |
|
@egangu so for |
You're right. But what I mean is that the original CP cannot be enabled on the current version, no matter how user set the |
|
thanks for the report @egangu let me test and fix |
This is the completion of the work started by @S1ro1 at #3782 to integrate the ALST/Ulysses long sequence training into HF Accelerate. Paper https://arxiv.org/abs/2506.13996. This is Matej's original code with lots of additional work on top and docs+tests from me.
Here is the corresponding HF Trainer integration PR: huggingface/transformers#41832
If you want to try it out please first install deepspeed from
deepspeed@masteras deepspeed needed some tweaks to make this integration work.To use this feature a user needs
ParallelismConfigshift_labelsand an aggregation of loss across ranksQuality validation
I wrote 3 accelerate-based scripts (attached at the end of the OP):
The loss checks out with very small variations due the precision loss in aggregation.
TODO
These are really needed for the HF Trainer PR huggingface/transformers#41832 but since it anchors on accelerate let's make the dependency here instead.
Scripts used to perform the quality validation
The scripts and config files are:
You run the
.shfilescc: @SunMarc