pytorch
diff --git a/‎_downloads/3195443a0ced3cabc0ad643537bdb5cd/introyt1_tutorial.ipynb‎
Lines changed: 2 additions & 2 deletions b/‎_downloads/3195443a0ced3cabc0ad643537bdb5cd/introyt1_tutorial.ipynb‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎_downloads/38991cbc7763ed7e0f1b711da737b391/tuning_guide.ipynb‎
Lines changed: 66 additions & 127 deletions b/‎_downloads/38991cbc7763ed7e0f1b711da737b391/tuning_guide.ipynb‎
Lines changed: 66 additions & 127 deletions
diff --git a/‎_downloads/4355e2cef7d17548f1e25f97a62828c4/template_tutorial.ipynb‎
Lines changed: 2 additions & 2 deletions b/‎_downloads/4355e2cef7d17548f1e25f97a62828c4/template_tutorial.ipynb‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎_downloads/63a0f0fc7b3ffb15d3a5ac8db3d521ee/tensors_deeper_tutorial.ipynb‎
Lines changed: 2 additions & 2 deletions b/‎_downloads/63a0f0fc7b3ffb15d3a5ac8db3d521ee/tensors_deeper_tutorial.ipynb‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎_downloads/770632dd3941d2a51b831c52ded57aa2/trainingyt.ipynb‎
Lines changed: 2 additions & 2 deletions b/‎_downloads/770632dd3941d2a51b831c52ded57aa2/trainingyt.ipynb‎
Lines changed: 2 additions & 2 deletions
@@ -34,7 +34,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "bfbd9484",
+   "id": "7664dafa",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -50,7 +50,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "698f078d",
+   "id": "7204472c",
    "metadata": {},
    "source": [
     "\n",
 
@@ -28,10 +28,32 @@
     "a few lines of code and can be applied to a wide range of deep learning\n",
     "models across all domains.\n",
     "\n",
+    "<div style=\"width: 45%; float: left; padding: 20px;\"><h2> What you will learn</h2><ul><li>General optimization techniques for PyTorch models</li><li>CPU-specific performance optimizations</li><li>GPU acceleration strategies</li><li>Distributed training optimizations</li></ul></div><div style=\"width: 45%; float: right; padding: 20px;\"><h2> Prerequisites</h2><ul><li>PyTorch 2.0 or later</li><li>Python 3.8 or later</li><li>CUDA-capable GPU (recommended for GPU optimizations)</li><li>Linux, macOS, or Windows operating system</li></ul></div>\n",
+    "\n",
+    "Overview\n",
+    "--------\n",
+    "\n",
+    "Performance optimization is crucial for efficient deep learning model\n",
+    "training and inference. This tutorial covers a comprehensive set of\n",
+    "techniques to accelerate PyTorch workloads across different hardware\n",
+    "configurations and use cases.\n",
+    "\n",
     "General optimizations\n",
     "---------------------\n"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torchvision"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -157,8 +179,7 @@
     "than setting it to zero, for more details refer to the\n",
     "[documentation](https://pytorch.org/docs/master/optim.html#torch.optim.Optimizer.zero_grad).\n",
     "\n",
-    "Alternatively, starting from PyTorch 1.7, call `model` or\n",
-    "`optimizer.zero_grad(set_to_none=True)`.\n"
+    "Alternatively, call `model` or `optimizer.zero_grad(set_to_none=True)`.\n"
    ]
   },
   {
@@ -222,10 +243,10 @@
     "Enable channels\\_last memory format for computer vision models\n",
     "==============================================================\n",
     "\n",
-    "PyTorch 1.5 introduced support for `channels_last` memory format for\n",
-    "convolutional networks. This format is meant to be used in conjunction\n",
-    "with [AMP](https://pytorch.org/docs/stable/amp.html) to further\n",
-    "accelerate convolutional neural networks with [Tensor\n",
+    "PyTorch supports `channels_last` memory format for convolutional\n",
+    "networks. This format is meant to be used in conjunction with\n",
+    "[AMP](https://pytorch.org/docs/stable/amp.html) to further accelerate\n",
+    "convolutional neural networks with [Tensor\n",
     "Cores](https://www.nvidia.com/en-us/data-center/tensor-cores/).\n",
     "\n",
     "Support for `channels_last` is experimental, but it\\'s expected to work\n",
@@ -439,125 +460,6 @@
     "```\n"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Use oneDNN Graph with TorchScript for inference\n",
-    "===============================================\n",
-    "\n",
-    "oneDNN Graph can significantly boost inference performance. It fuses\n",
-    "some compute-intensive operations such as convolution, matmul with their\n",
-    "neighbor operations. In PyTorch 2.0, it is supported as a beta feature\n",
-    "for `Float32` & `BFloat16` data-types. oneDNN Graph receives the model's\n",
-    "graph and identifies candidates for operator-fusion with respect to the\n",
-    "shape of the example input. A model should be JIT-traced using an\n",
-    "example input. Speed-up would then be observed after a couple of warm-up\n",
-    "iterations for inputs with the same shape as the example input. The\n",
-    "example code-snippets below are for resnet50, but they can very well be\n",
-    "extended to use oneDNN Graph with custom models as well.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "# Only this extra line of code is required to use oneDNN Graph\n",
-    "torch.jit.enable_onednn_fusion(True)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Using the oneDNN Graph API requires just one extra line of code for\n",
-    "inference with Float32. If you are using oneDNN Graph, please avoid\n",
-    "calling `torch.jit.optimize_for_inference`.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "# sample input should be of the same shape as expected inputs\n",
-    "sample_input = [torch.rand(32, 3, 224, 224)]\n",
-    "# Using resnet50 from torchvision in this example for illustrative purposes,\n",
-    "# but the line below can indeed be modified to use custom models as well.\n",
-    "model = getattr(torchvision.models, \"resnet50\")().eval()\n",
-    "# Tracing the model with example input\n",
-    "traced_model = torch.jit.trace(model, sample_input)\n",
-    "# Invoking torch.jit.freeze\n",
-    "traced_model = torch.jit.freeze(traced_model)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Once a model is JIT-traced with a sample input, it can then be used for\n",
-    "inference after a couple of warm-up runs.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "with torch.no_grad():\n",
-    "    # a couple of warm-up runs\n",
-    "    traced_model(*sample_input)\n",
-    "    traced_model(*sample_input)\n",
-    "    # speedup would be observed after warm-up runs\n",
-    "    traced_model(*sample_input)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "While the JIT fuser for oneDNN Graph also supports inference with\n",
-    "`BFloat16` datatype, performance benefit with oneDNN Graph is only\n",
-    "exhibited by machines with AVX512\\_BF16 instruction set architecture\n",
-    "(ISA). The following code snippets serves as an example of using\n",
-    "`BFloat16` datatype for inference with oneDNN Graph:\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "# AMP for JIT mode is enabled by default, and is divergent with its eager mode counterpart\n",
-    "torch._C._jit_set_autocast_mode(False)\n",
-    "\n",
-    "with torch.no_grad(), torch.cpu.amp.autocast(cache_enabled=False, dtype=torch.bfloat16):\n",
-    "    # Conv-BatchNorm folding for CNN-based Vision Models should be done with ``torch.fx.experimental.optimization.fuse`` when AMP is used\n",
-    "    import torch.fx.experimental.optimization as optimization\n",
-    "    # Please note that optimization.fuse need not be called when AMP is not used\n",
-    "    model = optimization.fuse(model)\n",
-    "    model = torch.jit.trace(model, (example_input))\n",
-    "    model = torch.jit.freeze(model)\n",
-    "    # a couple of warm-up runs\n",
-    "    model(example_input)\n",
-    "    model(example_input)\n",
-    "    # speedup would be observed in subsequent runs.\n",
-    "    model(example_input)"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -751,9 +653,8 @@
     "        NLP models\n",
     "-   enable AMP\n",
     "    -   Introduction to Mixed Precision Training and AMP:\n",
-    "        [video](https://www.youtube.com/watch?v=jF4-_ZK_tyc&feature=youtu.be),\n",
     "        [slides](https://nvlabs.github.io/eccv2020-mixed-precision-tutorial/files/dusan_stosic-training-neural-networks-with-tensor-cores.pdf)\n",
-    "    -   native PyTorch AMP is available starting from PyTorch 1.6:\n",
+    "    -   native PyTorch AMP is available:\n",
     "        [documentation](https://pytorch.org/docs/stable/amp.html),\n",
     "        [examples](https://pytorch.org/docs/stable/notes/amp_examples.html#amp-examples),\n",
     "        [tutorial](https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html)\n"
@@ -894,6 +795,44 @@
     "by bucketing samples with similar sequence length or even by sorting\n",
     "dataset by sequence length.\n"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Conclusion\n",
+    "==========\n",
+    "\n",
+    "This tutorial covered a comprehensive set of performance optimization\n",
+    "techniques for PyTorch models. The key takeaways include:\n",
+    "\n",
+    "-   **General optimizations**: Enable async data loading, disable\n",
+    "    gradients for inference, fuse operations with `torch.compile`, and\n",
+    "    use efficient memory formats\n",
+    "-   **CPU optimizations**: Leverage NUMA controls, optimize OpenMP\n",
+    "    settings, and use efficient memory allocators\n",
+    "-   **GPU optimizations**: Enable Tensor cores, use CUDA graphs, enable\n",
+    "    cuDNN autotuner, and implement mixed precision training\n",
+    "-   **Distributed optimizations**: Use DistributedDataParallel, optimize\n",
+    "    gradient synchronization, and balance workloads across devices\n",
+    "\n",
+    "Many of these optimizations can be applied with minimal code changes and\n",
+    "provide significant performance improvements across a wide range of deep\n",
+    "learning models.\n",
+    "\n",
+    "Further Reading\n",
+    "===============\n",
+    "\n",
+    "-   [PyTorch Performance Tuning\n",
+    "    Documentation](https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html)\n",
+    "-   [CUDA Best\n",
+    "    Practices](https://pytorch.org/docs/stable/notes/cuda.html)\n",
+    "-   [Distributed Training\n",
+    "    Documentation](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html)\n",
+    "-   [Mixed Precision Training](https://pytorch.org/docs/stable/amp.html)\n",
+    "-   [torch.compile\n",
+    "    Tutorial](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html)\n"
+   ]
   }
  ],
  "metadata": {
 
@@ -31,7 +31,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "38143db6",
+   "id": "516fd8da",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -47,7 +47,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "14c2d630",
+   "id": "caf5c545",
    "metadata": {},
    "source": [
     "\n",
 
@@ -34,7 +34,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "37ed5738",
+   "id": "de40642c",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -50,7 +50,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "4d05cb50",
+   "id": "4d452def",
    "metadata": {},
    "source": [
     "\n",
 
@@ -35,7 +35,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "060b2582",
+   "id": "5b434bf1",
    "metadata": {},
    "outputs": [],
    "source": [
@@ -51,7 +51,7 @@
   },
   {
    "cell_type": "markdown",
-   "id": "7ae9c5e3",
+   "id": "29dc637b",
    "metadata": {},
    "source": [
     "\n",