Skip to content

Commit cf74c6f

Browse files
authored
Merge branch 'main' into dggaytan/distributed_DDP_backup
2 parents e4ead3a + acc295d commit cf74c6f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+550
-579
lines changed

README.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,5 @@
11
# PyTorch Examples
22

3-
![Run Examples](https://github.com/pytorch/examples/workflows/Run%20Examples/badge.svg)
4-
53
https://pytorch.org/examples/
64

75
`pytorch/examples` is a repository showcasing examples of using [PyTorch](https://github.com/pytorch/pytorch). The goal is to have curated, short, few/no dependencies _high quality_ examples that are substantially different from each other that can be emulated in your existing work.
@@ -21,7 +19,7 @@ https://pytorch.org/examples/
2119
- [Variational Auto-Encoders](./vae/README.md)
2220
- [Superresolution using an efficient sub-pixel convolutional neural network](./super_resolution/README.md)
2321
- [Hogwild training of shared ConvNets across multiple processes on MNIST](mnist_hogwild)
24-
- [Training a CartPole to balance in OpenAI Gym with actor-critic](./reinforcement_learning/README.md)
22+
- [Training a CartPole to balance with actor-critic](./reinforcement_learning/README.md)
2523
- [Natural Language Inference (SNLI) with GloVe vectors, LSTMs, and torchtext](snli)
2624
- [Time sequence prediction - use an LSTM to learn Sine waves](./time_sequence_prediction/README.md)
2725
- [Implement the Neural Style Transfer algorithm on images](./fast_neural_style/README.md)
@@ -32,8 +30,6 @@ https://pytorch.org/examples/
3230
- [Image Classification Using Forward-Forward](./mnist_forward_forward/README.md)
3331
- [Language Translation using Transformers](./language_translation/README.md)
3432

35-
36-
3733
Additionally, a list of good examples hosted in their own repositories:
3834

3935
- [Neural Machine Translation using sequence-to-sequence RNN with attention (OpenNMT)](https://github.com/OpenNMT/OpenNMT-py)

distributed/FSDP/README.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
1-
## FSDP T5
1+
Note: FSDP1 is deprecated. Please follow [FSDP2 tutorial](https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html) and [code examples](https://github.com/pytorch/examples/tree/main/distributed/FSDP2).
22

3-
To run the T5 example with FSDP for text summarization:
3+
## FSDP1 T5
4+
5+
6+
7+
To run the T5 example with FSDP1 for text summarization:
48

59
## Get the wikihow dataset
610
```bash

distributed/FSDP/T5_training.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -199,11 +199,11 @@ def fsdp_main(args):
199199
# Training settings
200200
parser = argparse.ArgumentParser(description='PyTorch T5 FSDP Example')
201201
parser.add_argument('--batch-size', type=int, default=4, metavar='N',
202-
help='input batch size for training (default: 64)')
202+
help='input batch size for training (default: 4)')
203203
parser.add_argument('--test-batch-size', type=int, default=4, metavar='N',
204-
help='input batch size for testing (default: 1000)')
204+
help='input batch size for testing (default: 4)')
205205
parser.add_argument('--epochs', type=int, default=2, metavar='N',
206-
help='number of epochs to train (default: 3)')
206+
help='number of epochs to train (default: 2)')
207207
parser.add_argument('--seed', type=int, default=1, metavar='S',
208208
help='random seed (default: 1)')
209209
parser.add_argument('--track_memory', action='store_false', default=True,

distributed/FSDP/utils/environment.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,6 @@
11
# Copyright (c) 2022 Meta Platforms, Inc. and its affiliates.
22
# All rights reserved.
33
#
4-
# This source code is licensed under the Apache-style license found in the
5-
# LICENSE file in the root directory of this source tree.
64

75
# This is a simple check to confirm that your current server has full bfloat support -
86
# both GPU native support, and Network communication support.

distributed/FSDP2/README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,27 @@
11
## FSDP2
22
To run FSDP2 on transformer model:
3+
34
```
45
cd distributed/FSDP2
5-
torchrun --nproc_per_node 2 train.py
6+
pip install -r requirements.txt
7+
torchrun --nproc_per_node 2 example.py
68
```
79
* For 1st time, it creates a "checkpoints" folder and saves state dicts there
810
* For 2nd time, it loads from previous checkpoints
911

1012
To enable explicit prefetching
1113
```
12-
torchrun --nproc_per_node 2 train.py --explicit-prefetch
14+
torchrun --nproc_per_node 2 example.py --explicit-prefetch
1315
```
1416

1517
To enable mixed precision
1618
```
17-
torchrun --nproc_per_node 2 train.py --mixed-precision
19+
torchrun --nproc_per_node 2 example.py --mixed-precision
1820
```
1921

2022
To showcase DCP API
2123
```
22-
torchrun --nproc_per_node 2 train.py --dcp-api
24+
torchrun --nproc_per_node 2 example.py --dcp-api
2325
```
2426

2527
## Ensure you are running a recent version of PyTorch:

distributed/FSDP2/train.py renamed to distributed/FSDP2/example.py

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,11 @@
77
from torch.distributed.fsdp import fully_shard, MixedPrecisionPolicy
88
from utils import inspect_mixed_precision, inspect_model
99

10+
def verify_min_gpu_count(min_gpus: int = 2) -> bool:
11+
""" verification that we have at least 2 gpus to run dist examples """
12+
has_gpu = torch.accelerator.is_available()
13+
gpu_count = torch.accelerator.device_count()
14+
return has_gpu and gpu_count >= min_gpus
1015

1116
def set_modules_to_forward_prefetch(model, num_to_forward_prefetch):
1217
for i, layer in enumerate(model.layers):
@@ -29,10 +34,23 @@ def set_modules_to_backward_prefetch(model, num_to_backward_prefetch):
2934

3035

3136
def main(args):
37+
_min_gpu_count = 2
38+
if not verify_min_gpu_count(min_gpus=_min_gpu_count):
39+
print(f"Unable to locate sufficient {_min_gpu_count} gpus to run this example. Exiting.")
40+
exit()
3241
rank = int(os.environ["LOCAL_RANK"])
33-
device = torch.device(f"cuda:{rank}")
34-
torch.cuda.set_device(device)
35-
torch.distributed.init_process_group(backend="nccl", device_id=device)
42+
if torch.accelerator.is_available():
43+
device_type = torch.accelerator.current_accelerator()
44+
device = torch.device(f"{device_type}:{rank}")
45+
torch.accelerator.device_index(rank)
46+
print(f"Running on rank {rank} on device {device}")
47+
else:
48+
device = torch.device("cpu")
49+
print(f"Running on device {device}")
50+
51+
backend = torch.distributed.get_default_backend_for_device(device)
52+
torch.distributed.init_process_group(backend=backend, device_id=device)
53+
3654
torch.manual_seed(0)
3755
vocab_size = 1024
3856
batch_size = 32
@@ -64,7 +82,7 @@ def main(args):
6482

6583
checkpointer = Checkpointer("checkpoints", dcp_api=args.dcp_api)
6684
if checkpointer.last_training_time is None:
67-
model.to_empty(device="cuda")
85+
model.to_empty(device=device)
6886
model.reset_parameters()
6987
else:
7088
checkpointer.load_model(model)
@@ -96,4 +114,5 @@ def main(args):
96114
parser.add_argument("--mixed-precision", action="store_true", default=False)
97115
parser.add_argument("--dcp-api", action="store_true", default=False)
98116
args = parser.parse_args()
117+
99118
main(args)

distributed/FSDP2/requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
torch>=2.7
2+
numpy

distributed/FSDP2/run_example.sh

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# /bin/bash
2+
# bash run_example.sh {file_to_run.py} {num_gpus}
3+
# where file_to_run = example to run. Default = 'example.py'
4+
# num_gpus = num local gpus to use (must be at least 2). Default = 4
5+
6+
# samples to run include:
7+
# example.py
8+
9+
echo "Launching ${1:-example.py} with ${2:-4} gpus"
10+
torchrun --nnodes=1 --nproc_per_node=${2:-4} ${1:-example.py}
11+

0 commit comments

Comments
 (0)