Update Domino for Llama3 #7084

shenzheyu · 2025-02-26T20:08:20Z

No description provided.

GuanhuaWang · 2025-03-05T21:30:11Z

@hwchen2017 , please follow up on this pr. thank you!

Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

Signed-off-by: Logan Adams <loadams@microsoft.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

Propagate API change. Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

- add zero2 test - minor fix with transformer version update & ds master merge. Signed-off-by: inkcherry <mingzhi.liu@intel.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

bf16 with moe refresh optimizer state from bf16 ckpt will raise IndexError: list index out of range Signed-off-by: shaomin <wukon1992@gmail.com> Co-authored-by: shaomin <wukon1992@gmail.com> Co-authored-by: Hongwei Chen <33092912+hwchen2017@users.noreply.github.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

@loadams

**Auto-generated PR to update version.txt after a DeepSpeed release** Released version - 0.16.4 Author - @loadams Co-authored-by: loadams <loadams@users.noreply.github.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

@jeffra

@jeffra and I fixed this many years ago, so bringing this doc to a correct state. --------- Signed-off-by: Stas Bekman <stas@stason.org> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

Description This PR includes Tecorigin SDAA accelerator support. With this PR, DeepSpeed supports SDAA as backend for training tasks. --------- Signed-off-by: siqi <siqi@tecorigin.com> Co-authored-by: siqi <siqi@tecorigin.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

More information on libuv in pytorch: https://pytorch.org/tutorials/intermediate/TCPStore_libuv_backend.html Issue tracking the prevalence of the error on Windows (unresolved at the time of this PR): pytorch/pytorch#139990 LibUV github: https://github.com/libuv/libuv Windows error: ``` File "C:\hostedtoolcache\windows\Python\3.12.7\x64\Lib\site-packages\torch\distributed\rendezvous.py", line 189, in _create_c10d_store return TCPStore( ^^^^^^^^^ RuntimeError: use_libuv was requested but PyTorch was build without libuv support ``` use_libuv isn't well supported on Windows in pytorch <2.4, so we need to guard around this case. --------- Signed-off-by: Logan Adams <loadams@microsoft.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

Signed-off-by: Logan Adams <loadams@microsoft.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

@fukun07

@fukun07 and I discovered a bug when using the `offload_states` and `reload_states` APIs of the Zero3 optimizer. When using grouped parameters (for example, in weight decay or grouped lr scenarios), the order of the parameters mapping in `reload_states` ([here](https://github.com/deepspeedai/DeepSpeed/blob/14b3cce4aaedac69120d386953e2b4cae8c2cf2c/deepspeed/runtime/zero/stage3.py#L2953)) does not correspond with the initialization of `self.lp_param_buffer` ([here](https://github.com/deepspeedai/DeepSpeed/blob/14b3cce4aaedac69120d386953e2b4cae8c2cf2c/deepspeed/runtime/zero/stage3.py#L731)), which leads to misaligned parameter loading. This issue was overlooked by the corresponding unit tests ([here](https://github.com/deepspeedai/DeepSpeed/blob/master/tests/unit/runtime/zero/test_offload_states.py)), so we fixed the bug in our PR and added the corresponding unit tests. --------- Signed-off-by: Wei Wu <wuwei211x@gmail.com> Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

Signed-off-by: Logan Adams <loadams@microsoft.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

Following changes in Pytorch trace rules , my previous PR to avoid graph breaks caused by logger is no longer relevant. So instead I've added this functionality to torch dynamo - pytorch/pytorch@16ea0dd This commit allows the user to config torch to ignore logger methods and avoid associated graph breaks. To enable ignore logger methods - os.environ["DISABLE_LOGS_WHILE_COMPILING"] = "1" To ignore logger methods except for a specific method / methods (for example, info and isEnabledFor) - os.environ["DISABLE_LOGS_WHILE_COMPILING"] = "1" and os.environ["LOGGER_METHODS_TO_EXCLUDE_FROM_DISABLE"] = "info, isEnabledFor" Signed-off-by: ShellyNR <shelly.nahir@live.biu.ac.il> Co-authored-by: snahir <snahir@habana.ai> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

The partition tensor doesn't need to move to the current device when meta load is used. Signed-off-by: Lai, Yejing <yejing.lai@intel.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

…t` (deepspeedai#7069) With future changes coming to pip/python/etc, we need to modify to no longer call `python setup.py ...` and replace it instead: https://packaging.python.org/en/latest/guides/modernize-setup-py-project/#should-setup-py-be-deleted ![image](https://github.com/user-attachments/assets/ea39ef7b-3cbe-4916-86f0-bc46a5fce96d) This means we need to install the build package which is added here as well. Additionally, we pass the `--sdist` flag to only build the sdist rather than the wheel as well here. --------- Signed-off-by: Logan Adams <loadams@microsoft.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

…eepspeedai#7076) This reverts commit 8577bd2. Fixes: deepspeedai#7072 Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

Add deepseekv3 autotp. Signed-off-by: Lai, Yejing <yejing.lai@intel.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

loadams · 2025-08-11T20:13:03Z

@shenzheyu - could you please resolve merge conflicts and then we can get this reviewed? Thanks!

GuanhuaWang · 2025-08-21T22:58:07Z

@shenzheyu , please help here, thanks

shenzheyu requested review from GuanhuaWang and hwchen2017 as code owners February 26, 2025 20:08

loadams and others added 19 commits March 5, 2025 17:55

Update setup.py handling of ROCm cupy (deepspeedai#7051)

963f11b

Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

nv-ds-chat breaks with latest transformers (deepspeedai#7052)

f538f55

Signed-off-by: Logan Adams <loadams@microsoft.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

update for llama3

ef6c29b

Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

fix format

54a1421

Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

Rename aio_thread_count to intra_op_parallelism (deepspeedai#7056)

4239526

Propagate API change. Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

add autoTP training zero2 tests (deepspeedai#7049)

f3ce29f

- add zero2 test - minor fix with transformer version update & ds master merge. Signed-off-by: inkcherry <mingzhi.liu@intel.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

Update version.txt after 0.16.4 release (deepspeedai#7063)

adb4e08

**Auto-generated PR to update version.txt after a DeepSpeed release** Released version - 0.16.4 Author - @loadams Co-authored-by: loadams <loadams@users.noreply.github.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

fix an outdated doc wrt CUDA_VISIBLE_DEVICES (deepspeedai#7058)

aeaf0ce

@jeffra and I fixed this many years ago, so bringing this doc to a correct state. --------- Signed-off-by: Stas Bekman <stas@stason.org> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

Update README with info on newest accelerator (deepspeedai#7065)

1faaf1e

Signed-off-by: Logan Adams <loadams@microsoft.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

Fix TOCTOU issues, switch to fstat (deepspeedai#7067)

3638f9c

Signed-off-by: Logan Adams <loadams@microsoft.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

Revert "Handle special case of libuv for Windows (deepspeedai#7064)" (d…

dddc7cf

…eepspeedai#7076) This reverts commit 8577bd2. Fixes: deepspeedai#7072 Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

Add DeepseekV3 AutoTP. (deepspeedai#7045)

91d05e2

Add deepseekv3 autotp. Signed-off-by: Lai, Yejing <yejing.lai@intel.com> Signed-off-by: Zheyu SHEN <zyshen@umd.edu>

shenzheyu force-pushed the master branch from 6d32bb4 to 91d05e2 Compare March 5, 2025 22:56

shenzheyu requested review from jomayeri, loadams, tjruwase and tohtana as code owners March 5, 2025 22:56

Merge branch 'master' into master

be199d2

hwchen2017 marked this pull request as draft September 3, 2025 05:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Domino for Llama3 #7084

Update Domino for Llama3 #7084

Uh oh!

shenzheyu commented Feb 26, 2025

Uh oh!

GuanhuaWang commented Mar 5, 2025

Uh oh!

loadams commented Aug 11, 2025

Uh oh!

GuanhuaWang commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Update Domino for Llama3 #7084

Are you sure you want to change the base?

Update Domino for Llama3 #7084

Uh oh!

Conversation

shenzheyu commented Feb 26, 2025

Uh oh!

GuanhuaWang commented Mar 5, 2025

Uh oh!

loadams commented Aug 11, 2025

Uh oh!

GuanhuaWang commented Aug 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants