pytorch
diff --git a/‎Makefile
Lines changed: 0 additions & 17 deletions b/‎Makefile
Lines changed: 0 additions & 17 deletions
diff --git a/‎_static/img/thumbnails/cropped/graph-mode-dynamic-bert.png
-132 KB b/‎_static/img/thumbnails/cropped/graph-mode-dynamic-bert.png
-132 KB
diff --git a/‎intermediate_source/ddp_series_minGPT.rst
Lines changed: 11 additions & 12 deletions b/‎intermediate_source/ddp_series_minGPT.rst
Lines changed: 11 additions & 12 deletions
diff --git a/‎prototype_source/README.txt
Lines changed: 6 additions & 30 deletions b/‎prototype_source/README.txt
Lines changed: 6 additions & 30 deletions
@@ -61,23 +61,6 @@ download:
 	wget -nv -N https://s3.amazonaws.com/pytorch-tutorial-assets/cornell_movie_dialogs_corpus_v2.zip -P $(DATADIR)
 	unzip $(ZIPOPTS) $(DATADIR)/cornell_movie_dialogs_corpus_v2.zip -d beginner_source/data/
 
-	# Download model for advanced_source/dynamic_quantization_tutorial.py
-	wget -nv -N https://s3.amazonaws.com/pytorch-tutorial-assets/word_language_model_quantize.pth -P $(DATADIR)
-	cp $(DATADIR)/word_language_model_quantize.pth advanced_source/data/word_language_model_quantize.pth
-
-	# Download data for advanced_source/dynamic_quantization_tutorial.py
-	wget -nv -N https://s3.amazonaws.com/pytorch-tutorial-assets/wikitext-2.zip -P $(DATADIR)
-	unzip $(ZIPOPTS) $(DATADIR)/wikitext-2.zip -d advanced_source/data/
-
-	# Download model for advanced_source/static_quantization_tutorial.py
-	wget -nv -N https://download.pytorch.org/models/mobilenet_v2-b0353104.pth -P $(DATADIR)
-	cp $(DATADIR)/mobilenet_v2-b0353104.pth advanced_source/data/mobilenet_pretrained_float.pth
-
-
-	# Download model for prototype_source/graph_mode_static_quantization_tutorial.py
-	wget -nv -N https://download.pytorch.org/models/resnet18-5c106cde.pth -P $(DATADIR)
-	cp $(DATADIR)/resnet18-5c106cde.pth prototype_source/data/resnet18_pretrained_float.pth
-
 	# Download PennFudanPed dataset for intermediate_source/torchvision_tutorial.py
 	wget https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip -P $(DATADIR)
 	unzip -o $(DATADIR)/PennFudanPed.zip -d intermediate_source/data/
 
@@ -26,10 +26,11 @@ Authors: `Suraj Subramanian <https://github.com/subramen>`__
    .. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites
       :class-card: card-prerequisites
 
-      - Familiarity with `multi-GPU training <../beginner/ddp_series_multigpu.html>`__ and `torchrun <../beginner/ddp_series_fault_tolerance.html>`__ 
-      - [Optional] Familiarity with `multinode training <ddp_series_multinode.html>`__
-      - 2 or more TCP-reachable GPU machines (this tutorial uses AWS p3.2xlarge instances)
       - PyTorch `installed <https://pytorch.org/get-started/locally/>`__ with CUDA on all machines
+      - Familiarity with `multi-GPU training <../beginner/ddp_series_multigpu.html>`__ and `torchrun <../beginner/ddp_series_fault_tolerance.html>`__
+      - [Optional] Familiarity with `multinode training <ddp_series_multinode.html>`__
+      - 2 or more TCP-reachable GPU machines for multi-node training (this tutorial uses AWS p3.2xlarge instances)
+
 
 Follow along with the video below or on `youtube <https://www.youtube.com/watch/XFsFDGKZHh4>`__.
 
@@ -63,25 +64,23 @@ from any node that has access to the cloud bucket.
 
 Using Mixed Precision
 ~~~~~~~~~~~~~~~~~~~~~~~~
-To speed things up, you might be able to use `Mixed Precision <https://pytorch.org/docs/stable/amp.html>`__ to train your models. 
-In Mixed Precision, some parts of the training process are carried out in reduced precision, while other steps 
-that are more sensitive to precision drops are maintained in FP32 precision. 
+To speed things up, you might be able to use `Mixed Precision <https://pytorch.org/docs/stable/amp.html>`__ to train your models.
+In Mixed Precision, some parts of the training process are carried out in reduced precision, while other steps
+that are more sensitive to precision drops are maintained in FP32 precision.
 
 
 When is DDP not enough?
 ~~~~~~~~~~~~~~~~~~~~~~~~
 A typical training run's memory footprint consists of model weights, activations, gradients, the input batch, and the optimizer state.
-Since DDP replicates the model on each GPU, it only works when GPUs have sufficient capacity to accomodate the full footprint. 
+Since DDP replicates the model on each GPU, it only works when GPUs have sufficient capacity to accomodate the full footprint.
 When models grow larger, more aggressive techniques might be useful:
 
--  `activation checkpointing <https://pytorch.org/docs/stable/checkpoint.html>`__: Instead of saving intermediate activations during the forward pass, the activations are recomputed during the backward pass. In this approach, we run more compute but save on memory footprint.
--  `Fully-Sharded Data Parallel <https://pytorch.org/docs/stable/fsdp.html>`__: Here the model is not replicated but "sharded" across all the GPUs, and computation is overlapped with communication in the forward and backward passes. Read our `blog <https://medium.com/pytorch/training-a-1-trillion-parameter-model-with-pytorch-fully-sharded-data-parallel-on-aws-3ac13aa96cff>`__ to learn how we trained a 1 Trillion parameter model with FSDP.
-
+-  `Activation checkpointing <https://pytorch.org/docs/stable/checkpoint.html>`__: Instead of saving intermediate activations during the forward pass, the activations are recomputed during the backward pass. In this approach, we run more compute but save on memory footprint.
+-  `Fully-Sharded Data Parallel <https://docs.pytorch.org/docs/stable/distributed.fsdp.fully_shard.html>`__: Here the model is not replicated but "sharded" across all the GPUs, and computation is overlapped with communication in the forward and backward passes. Read our `blog <https://medium.com/pytorch/training-a-1-trillion-parameter-model-with-pytorch-fully-sharded-data-parallel-on-aws-3ac13aa96cff>`__ to learn how we trained a 1 Trillion parameter model with FSDP.
 
 Further Reading
 ---------------
 -  `Multi-Node training with DDP <ddp_series_multinode.html>`__ (previous tutorial in this series)
 -  `Mixed Precision training <https://pytorch.org/docs/stable/amp.html>`__
--  `Fully-Sharded Data Parallel <https://pytorch.org/docs/stable/fsdp.html>`__
+-  `Fully-Sharded Data Parallel tutorial <https://docs.pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
 -  `Training a 1T parameter model with FSDP <https://medium.com/pytorch/training-a-1-trillion-parameter-model-with-pytorch-fully-sharded-data-parallel-on-aws-3ac13aa96cff>`__
--  `FSDP Video Tutorial Series <https://www.youtube.com/playlist?list=PL_lsbAsL_o2BT6aerEKgIoufVD_fodnuT>`__ 
@@ -4,42 +4,18 @@ Prototype Tutorials
            Profiling PyTorch RPC-Based Workloads
            https://github.com/pytorch/tutorials/blob/main/prototype_source/distributed_rpc_profiling.rst
 
-2. graph_mode_static_quantization_tutorial.py
-	   Graph Mode Post Training Static Quantization in PyTorch
-	   https://pytorch.org/tutorials/prototype/graph_mode_static_quantization_tutorial.html
-
-3. graph_mode_dynamic_bert_tutorial.rst
-	   Graph Mode Dynamic Quantization on BERT
-	   https://github.com/pytorch/tutorials/blob/main/prototype_source/graph_mode_dynamic_bert_tutorial.rst
-
-4. numeric_suite_tutorial.py
-	   PyTorch Numeric Suite Tutorial
-	   https://github.com/pytorch/tutorials/blob/main/prototype_source/numeric_suite_tutorial.py
-
-5. torchscript_freezing.py
+2. torchscript_freezing.py
 	   Model Freezing in TorchScript
 	   https://github.com/pytorch/tutorials/blob/main/prototype_source/torchscript_freezing.py
 
-6. vulkan_workflow.rst
+3. vulkan_workflow.rst
            Vulkan Backend User Workflow
-           https://pytorch.org/tutorials/intermediate/vulkan_workflow.html
-
-7. fx_graph_mode_ptq_static.rst
-	   FX Graph Mode Post Training Static Quantization
-	   https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_static.html
-
-8. fx_graph_mode_ptq_dynamic.py
-	   FX Graph Mode Post Training Dynamic Quantization
-	   https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_dynamic.html
-
-9. fx_graph_mode_quant_guide.py
-	   FX Graph Mode Quantization User Guide
-	   https://pytorch.org/tutorials/prototype/fx_graph_mode_quant_guide.html
-
-10 flight_recorder_tutorial.rst
+           https://pytorch.org/tutorials/prototype/vulkan_workflow.html
+           
+4. flight_recorder_tutorial.rst
 	   Flight Recorder User Guide
 	   https://pytorch.org/tutorials/prototype/flight_recorder_tutorial.html
 
-11 python_extension_autoload.rst
+5. python_extension_autoload.rst
 	   Autoloading Out-of-Tree Extension
 	   https://pytorch.org/tutorials/prototype/python_extension_autoload.html