Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions docs/README-EDiT.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
<h1 align="center"><b>EDiT</b></h1>
<h3 align="center"><b>EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models</b></h3>

[![ICLR](https://img.shields.io/badge/ICLR-2025-blue)](https://arxiv.org/abs/2412.07210)
[![arXiv](https://img.shields.io/badge/arXiv-2412.07210-b31b1b.svg)](https://arxiv.org/abs/2412.07210)

We present PyTorch code for [EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models](https://arxiv.org/abs/2412.07210).
We present PyTorch code for [EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models](https://arxiv.org/abs/2412.07210), ICLR'25.

# Introduction

Expand Down Expand Up @@ -262,7 +263,7 @@ python -m atorch.distributed.run --nproc_per_node 8 train.py \
--outer_optim_class sgd
```

## 3D (Megatron-LM) Integratopm
## 3D (Megatron-LM) Integration
EDiT can also be integrated with 3D training through megatron. In atorch, we provide such functionality through patches and ATorchTrainerV2.

ATorchTrainerV2 handles model construction and argument handling. To enable EDiT, we need to inject several special arguments to trainer's input arguments, and before everything, patch the megatron.
Expand All @@ -289,4 +290,4 @@ cd examples/local_sgd/atorch_trainer_megatron

bash train.sh
```
Alternatively, you may reference to atorch/local_sgd/megatron, and migrate the patches directly into your Megatron-LM repo.
Alternatively, you may reference to atorch/local_sgd/megatron, and migrate the patches directly into your Megatron-LM repo.