Skip to content

Commit 120f5b0

Browse files
authored
Merge branch 'main' into fix/ppo-mujoco-colab
2 parents 7dc6054 + fee83dd commit 120f5b0

File tree

7 files changed

+19
-689
lines changed

7 files changed

+19
-689
lines changed

Makefile

Lines changed: 0 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -61,23 +61,6 @@ download:
6161
wget -nv -N https://s3.amazonaws.com/pytorch-tutorial-assets/cornell_movie_dialogs_corpus_v2.zip -P $(DATADIR)
6262
unzip $(ZIPOPTS) $(DATADIR)/cornell_movie_dialogs_corpus_v2.zip -d beginner_source/data/
6363

64-
# Download model for advanced_source/dynamic_quantization_tutorial.py
65-
wget -nv -N https://s3.amazonaws.com/pytorch-tutorial-assets/word_language_model_quantize.pth -P $(DATADIR)
66-
cp $(DATADIR)/word_language_model_quantize.pth advanced_source/data/word_language_model_quantize.pth
67-
68-
# Download data for advanced_source/dynamic_quantization_tutorial.py
69-
wget -nv -N https://s3.amazonaws.com/pytorch-tutorial-assets/wikitext-2.zip -P $(DATADIR)
70-
unzip $(ZIPOPTS) $(DATADIR)/wikitext-2.zip -d advanced_source/data/
71-
72-
# Download model for advanced_source/static_quantization_tutorial.py
73-
wget -nv -N https://download.pytorch.org/models/mobilenet_v2-b0353104.pth -P $(DATADIR)
74-
cp $(DATADIR)/mobilenet_v2-b0353104.pth advanced_source/data/mobilenet_pretrained_float.pth
75-
76-
77-
# Download model for prototype_source/graph_mode_static_quantization_tutorial.py
78-
wget -nv -N https://download.pytorch.org/models/resnet18-5c106cde.pth -P $(DATADIR)
79-
cp $(DATADIR)/resnet18-5c106cde.pth prototype_source/data/resnet18_pretrained_float.pth
80-
8164
# Download PennFudanPed dataset for intermediate_source/torchvision_tutorial.py
8265
wget https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip -P $(DATADIR)
8366
unzip -o $(DATADIR)/PennFudanPed.zip -d intermediate_source/data/
Binary file not shown.

intermediate_source/ddp_series_minGPT.rst

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,11 @@ Authors: `Suraj Subramanian <https://github.com/subramen>`__
2626
.. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites
2727
:class-card: card-prerequisites
2828

29-
- Familiarity with `multi-GPU training <../beginner/ddp_series_multigpu.html>`__ and `torchrun <../beginner/ddp_series_fault_tolerance.html>`__
30-
- [Optional] Familiarity with `multinode training <ddp_series_multinode.html>`__
31-
- 2 or more TCP-reachable GPU machines (this tutorial uses AWS p3.2xlarge instances)
3229
- PyTorch `installed <https://pytorch.org/get-started/locally/>`__ with CUDA on all machines
30+
- Familiarity with `multi-GPU training <../beginner/ddp_series_multigpu.html>`__ and `torchrun <../beginner/ddp_series_fault_tolerance.html>`__
31+
- [Optional] Familiarity with `multinode training <ddp_series_multinode.html>`__
32+
- 2 or more TCP-reachable GPU machines for multi-node training (this tutorial uses AWS p3.2xlarge instances)
33+
3334

3435
Follow along with the video below or on `youtube <https://www.youtube.com/watch/XFsFDGKZHh4>`__.
3536

@@ -63,25 +64,23 @@ from any node that has access to the cloud bucket.
6364

6465
Using Mixed Precision
6566
~~~~~~~~~~~~~~~~~~~~~~~~
66-
To speed things up, you might be able to use `Mixed Precision <https://pytorch.org/docs/stable/amp.html>`__ to train your models.
67-
In Mixed Precision, some parts of the training process are carried out in reduced precision, while other steps
68-
that are more sensitive to precision drops are maintained in FP32 precision.
67+
To speed things up, you might be able to use `Mixed Precision <https://pytorch.org/docs/stable/amp.html>`__ to train your models.
68+
In Mixed Precision, some parts of the training process are carried out in reduced precision, while other steps
69+
that are more sensitive to precision drops are maintained in FP32 precision.
6970

7071

7172
When is DDP not enough?
7273
~~~~~~~~~~~~~~~~~~~~~~~~
7374
A typical training run's memory footprint consists of model weights, activations, gradients, the input batch, and the optimizer state.
74-
Since DDP replicates the model on each GPU, it only works when GPUs have sufficient capacity to accomodate the full footprint.
75+
Since DDP replicates the model on each GPU, it only works when GPUs have sufficient capacity to accomodate the full footprint.
7576
When models grow larger, more aggressive techniques might be useful:
7677

77-
- `activation checkpointing <https://pytorch.org/docs/stable/checkpoint.html>`__: Instead of saving intermediate activations during the forward pass, the activations are recomputed during the backward pass. In this approach, we run more compute but save on memory footprint.
78-
- `Fully-Sharded Data Parallel <https://pytorch.org/docs/stable/fsdp.html>`__: Here the model is not replicated but "sharded" across all the GPUs, and computation is overlapped with communication in the forward and backward passes. Read our `blog <https://medium.com/pytorch/training-a-1-trillion-parameter-model-with-pytorch-fully-sharded-data-parallel-on-aws-3ac13aa96cff>`__ to learn how we trained a 1 Trillion parameter model with FSDP.
79-
78+
- `Activation checkpointing <https://pytorch.org/docs/stable/checkpoint.html>`__: Instead of saving intermediate activations during the forward pass, the activations are recomputed during the backward pass. In this approach, we run more compute but save on memory footprint.
79+
- `Fully-Sharded Data Parallel <https://docs.pytorch.org/docs/stable/distributed.fsdp.fully_shard.html>`__: Here the model is not replicated but "sharded" across all the GPUs, and computation is overlapped with communication in the forward and backward passes. Read our `blog <https://medium.com/pytorch/training-a-1-trillion-parameter-model-with-pytorch-fully-sharded-data-parallel-on-aws-3ac13aa96cff>`__ to learn how we trained a 1 Trillion parameter model with FSDP.
8080

8181
Further Reading
8282
---------------
8383
- `Multi-Node training with DDP <ddp_series_multinode.html>`__ (previous tutorial in this series)
8484
- `Mixed Precision training <https://pytorch.org/docs/stable/amp.html>`__
85-
- `Fully-Sharded Data Parallel <https://pytorch.org/docs/stable/fsdp.html>`__
85+
- `Fully-Sharded Data Parallel tutorial <https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
8686
- `Training a 1T parameter model with FSDP <https://medium.com/pytorch/training-a-1-trillion-parameter-model-with-pytorch-fully-sharded-data-parallel-on-aws-3ac13aa96cff>`__
87-
- `FSDP Video Tutorial Series <https://www.youtube.com/playlist?list=PL_lsbAsL_o2BT6aerEKgIoufVD_fodnuT>`__

prototype_source/README.txt

Lines changed: 6 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -4,42 +4,18 @@ Prototype Tutorials
44
Profiling PyTorch RPC-Based Workloads
55
https://github.com/pytorch/tutorials/blob/main/prototype_source/distributed_rpc_profiling.rst
66

7-
2. graph_mode_static_quantization_tutorial.py
8-
Graph Mode Post Training Static Quantization in PyTorch
9-
https://pytorch.org/tutorials/prototype/graph_mode_static_quantization_tutorial.html
10-
11-
3. graph_mode_dynamic_bert_tutorial.rst
12-
Graph Mode Dynamic Quantization on BERT
13-
https://github.com/pytorch/tutorials/blob/main/prototype_source/graph_mode_dynamic_bert_tutorial.rst
14-
15-
4. numeric_suite_tutorial.py
16-
PyTorch Numeric Suite Tutorial
17-
https://github.com/pytorch/tutorials/blob/main/prototype_source/numeric_suite_tutorial.py
18-
19-
5. torchscript_freezing.py
7+
2. torchscript_freezing.py
208
Model Freezing in TorchScript
219
https://github.com/pytorch/tutorials/blob/main/prototype_source/torchscript_freezing.py
2210

23-
6. vulkan_workflow.rst
11+
3. vulkan_workflow.rst
2412
Vulkan Backend User Workflow
25-
https://pytorch.org/tutorials/intermediate/vulkan_workflow.html
26-
27-
7. fx_graph_mode_ptq_static.rst
28-
FX Graph Mode Post Training Static Quantization
29-
https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_static.html
30-
31-
8. fx_graph_mode_ptq_dynamic.py
32-
FX Graph Mode Post Training Dynamic Quantization
33-
https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_dynamic.html
34-
35-
9. fx_graph_mode_quant_guide.py
36-
FX Graph Mode Quantization User Guide
37-
https://pytorch.org/tutorials/prototype/fx_graph_mode_quant_guide.html
38-
39-
10 flight_recorder_tutorial.rst
13+
https://pytorch.org/tutorials/prototype/vulkan_workflow.html
14+
15+
4. flight_recorder_tutorial.rst
4016
Flight Recorder User Guide
4117
https://pytorch.org/tutorials/prototype/flight_recorder_tutorial.html
4218

43-
11 python_extension_autoload.rst
19+
5. python_extension_autoload.rst
4420
Autoloading Out-of-Tree Extension
4521
https://pytorch.org/tutorials/prototype/python_extension_autoload.html

0 commit comments

Comments
 (0)