Skip to content

Commit 8e30a23

Browse files
committed
Automated tutorials push
1 parent 93e5353 commit 8e30a23

File tree

223 files changed

+23903
-25454
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

223 files changed

+23903
-25454
lines changed

_downloads/3195443a0ced3cabc0ad643537bdb5cd/introyt1_tutorial.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@
3434
{
3535
"cell_type": "code",
3636
"execution_count": null,
37-
"id": "bfbd9484",
37+
"id": "7664dafa",
3838
"metadata": {},
3939
"outputs": [],
4040
"source": [
@@ -50,7 +50,7 @@
5050
},
5151
{
5252
"cell_type": "markdown",
53-
"id": "698f078d",
53+
"id": "7204472c",
5454
"metadata": {},
5555
"source": [
5656
"\n",

_downloads/38991cbc7763ed7e0f1b711da737b391/tuning_guide.ipynb

Lines changed: 66 additions & 127 deletions
Original file line numberDiff line numberDiff line change
@@ -28,10 +28,32 @@
2828
"a few lines of code and can be applied to a wide range of deep learning\n",
2929
"models across all domains.\n",
3030
"\n",
31+
"<div style=\"width: 45%; float: left; padding: 20px;\"><h2> What you will learn</h2><ul><li>General optimization techniques for PyTorch models</li><li>CPU-specific performance optimizations</li><li>GPU acceleration strategies</li><li>Distributed training optimizations</li></ul></div><div style=\"width: 45%; float: right; padding: 20px;\"><h2> Prerequisites</h2><ul><li>PyTorch 2.0 or later</li><li>Python 3.8 or later</li><li>CUDA-capable GPU (recommended for GPU optimizations)</li><li>Linux, macOS, or Windows operating system</li></ul></div>\n",
32+
"\n",
33+
"Overview\n",
34+
"--------\n",
35+
"\n",
36+
"Performance optimization is crucial for efficient deep learning model\n",
37+
"training and inference. This tutorial covers a comprehensive set of\n",
38+
"techniques to accelerate PyTorch workloads across different hardware\n",
39+
"configurations and use cases.\n",
40+
"\n",
3141
"General optimizations\n",
3242
"---------------------\n"
3343
]
3444
},
45+
{
46+
"cell_type": "code",
47+
"execution_count": null,
48+
"metadata": {
49+
"collapsed": false
50+
},
51+
"outputs": [],
52+
"source": [
53+
"import torch\n",
54+
"import torchvision"
55+
]
56+
},
3557
{
3658
"cell_type": "markdown",
3759
"metadata": {},
@@ -157,8 +179,7 @@
157179
"than setting it to zero, for more details refer to the\n",
158180
"[documentation](https://pytorch.org/docs/master/optim.html#torch.optim.Optimizer.zero_grad).\n",
159181
"\n",
160-
"Alternatively, starting from PyTorch 1.7, call `model` or\n",
161-
"`optimizer.zero_grad(set_to_none=True)`.\n"
182+
"Alternatively, call `model` or `optimizer.zero_grad(set_to_none=True)`.\n"
162183
]
163184
},
164185
{
@@ -222,10 +243,10 @@
222243
"Enable channels\\_last memory format for computer vision models\n",
223244
"==============================================================\n",
224245
"\n",
225-
"PyTorch 1.5 introduced support for `channels_last` memory format for\n",
226-
"convolutional networks. This format is meant to be used in conjunction\n",
227-
"with [AMP](https://pytorch.org/docs/stable/amp.html) to further\n",
228-
"accelerate convolutional neural networks with [Tensor\n",
246+
"PyTorch supports `channels_last` memory format for convolutional\n",
247+
"networks. This format is meant to be used in conjunction with\n",
248+
"[AMP](https://pytorch.org/docs/stable/amp.html) to further accelerate\n",
249+
"convolutional neural networks with [Tensor\n",
229250
"Cores](https://www.nvidia.com/en-us/data-center/tensor-cores/).\n",
230251
"\n",
231252
"Support for `channels_last` is experimental, but it\\'s expected to work\n",
@@ -439,125 +460,6 @@
439460
"```\n"
440461
]
441462
},
442-
{
443-
"cell_type": "markdown",
444-
"metadata": {},
445-
"source": [
446-
"Use oneDNN Graph with TorchScript for inference\n",
447-
"===============================================\n",
448-
"\n",
449-
"oneDNN Graph can significantly boost inference performance. It fuses\n",
450-
"some compute-intensive operations such as convolution, matmul with their\n",
451-
"neighbor operations. In PyTorch 2.0, it is supported as a beta feature\n",
452-
"for `Float32` & `BFloat16` data-types. oneDNN Graph receives the model's\n",
453-
"graph and identifies candidates for operator-fusion with respect to the\n",
454-
"shape of the example input. A model should be JIT-traced using an\n",
455-
"example input. Speed-up would then be observed after a couple of warm-up\n",
456-
"iterations for inputs with the same shape as the example input. The\n",
457-
"example code-snippets below are for resnet50, but they can very well be\n",
458-
"extended to use oneDNN Graph with custom models as well.\n"
459-
]
460-
},
461-
{
462-
"cell_type": "code",
463-
"execution_count": null,
464-
"metadata": {
465-
"collapsed": false
466-
},
467-
"outputs": [],
468-
"source": [
469-
"# Only this extra line of code is required to use oneDNN Graph\n",
470-
"torch.jit.enable_onednn_fusion(True)"
471-
]
472-
},
473-
{
474-
"cell_type": "markdown",
475-
"metadata": {},
476-
"source": [
477-
"Using the oneDNN Graph API requires just one extra line of code for\n",
478-
"inference with Float32. If you are using oneDNN Graph, please avoid\n",
479-
"calling `torch.jit.optimize_for_inference`.\n"
480-
]
481-
},
482-
{
483-
"cell_type": "code",
484-
"execution_count": null,
485-
"metadata": {
486-
"collapsed": false
487-
},
488-
"outputs": [],
489-
"source": [
490-
"# sample input should be of the same shape as expected inputs\n",
491-
"sample_input = [torch.rand(32, 3, 224, 224)]\n",
492-
"# Using resnet50 from torchvision in this example for illustrative purposes,\n",
493-
"# but the line below can indeed be modified to use custom models as well.\n",
494-
"model = getattr(torchvision.models, \"resnet50\")().eval()\n",
495-
"# Tracing the model with example input\n",
496-
"traced_model = torch.jit.trace(model, sample_input)\n",
497-
"# Invoking torch.jit.freeze\n",
498-
"traced_model = torch.jit.freeze(traced_model)"
499-
]
500-
},
501-
{
502-
"cell_type": "markdown",
503-
"metadata": {},
504-
"source": [
505-
"Once a model is JIT-traced with a sample input, it can then be used for\n",
506-
"inference after a couple of warm-up runs.\n"
507-
]
508-
},
509-
{
510-
"cell_type": "code",
511-
"execution_count": null,
512-
"metadata": {
513-
"collapsed": false
514-
},
515-
"outputs": [],
516-
"source": [
517-
"with torch.no_grad():\n",
518-
" # a couple of warm-up runs\n",
519-
" traced_model(*sample_input)\n",
520-
" traced_model(*sample_input)\n",
521-
" # speedup would be observed after warm-up runs\n",
522-
" traced_model(*sample_input)"
523-
]
524-
},
525-
{
526-
"cell_type": "markdown",
527-
"metadata": {},
528-
"source": [
529-
"While the JIT fuser for oneDNN Graph also supports inference with\n",
530-
"`BFloat16` datatype, performance benefit with oneDNN Graph is only\n",
531-
"exhibited by machines with AVX512\\_BF16 instruction set architecture\n",
532-
"(ISA). The following code snippets serves as an example of using\n",
533-
"`BFloat16` datatype for inference with oneDNN Graph:\n"
534-
]
535-
},
536-
{
537-
"cell_type": "code",
538-
"execution_count": null,
539-
"metadata": {
540-
"collapsed": false
541-
},
542-
"outputs": [],
543-
"source": [
544-
"# AMP for JIT mode is enabled by default, and is divergent with its eager mode counterpart\n",
545-
"torch._C._jit_set_autocast_mode(False)\n",
546-
"\n",
547-
"with torch.no_grad(), torch.cpu.amp.autocast(cache_enabled=False, dtype=torch.bfloat16):\n",
548-
" # Conv-BatchNorm folding for CNN-based Vision Models should be done with ``torch.fx.experimental.optimization.fuse`` when AMP is used\n",
549-
" import torch.fx.experimental.optimization as optimization\n",
550-
" # Please note that optimization.fuse need not be called when AMP is not used\n",
551-
" model = optimization.fuse(model)\n",
552-
" model = torch.jit.trace(model, (example_input))\n",
553-
" model = torch.jit.freeze(model)\n",
554-
" # a couple of warm-up runs\n",
555-
" model(example_input)\n",
556-
" model(example_input)\n",
557-
" # speedup would be observed in subsequent runs.\n",
558-
" model(example_input)"
559-
]
560-
},
561463
{
562464
"cell_type": "markdown",
563465
"metadata": {},
@@ -751,9 +653,8 @@
751653
" NLP models\n",
752654
"- enable AMP\n",
753655
" - Introduction to Mixed Precision Training and AMP:\n",
754-
" [video](https://www.youtube.com/watch?v=jF4-_ZK_tyc&feature=youtu.be),\n",
755656
" [slides](https://nvlabs.github.io/eccv2020-mixed-precision-tutorial/files/dusan_stosic-training-neural-networks-with-tensor-cores.pdf)\n",
756-
" - native PyTorch AMP is available starting from PyTorch 1.6:\n",
657+
" - native PyTorch AMP is available:\n",
757658
" [documentation](https://pytorch.org/docs/stable/amp.html),\n",
758659
" [examples](https://pytorch.org/docs/stable/notes/amp_examples.html#amp-examples),\n",
759660
" [tutorial](https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html)\n"
@@ -894,6 +795,44 @@
894795
"by bucketing samples with similar sequence length or even by sorting\n",
895796
"dataset by sequence length.\n"
896797
]
798+
},
799+
{
800+
"cell_type": "markdown",
801+
"metadata": {},
802+
"source": [
803+
"Conclusion\n",
804+
"==========\n",
805+
"\n",
806+
"This tutorial covered a comprehensive set of performance optimization\n",
807+
"techniques for PyTorch models. The key takeaways include:\n",
808+
"\n",
809+
"- **General optimizations**: Enable async data loading, disable\n",
810+
" gradients for inference, fuse operations with `torch.compile`, and\n",
811+
" use efficient memory formats\n",
812+
"- **CPU optimizations**: Leverage NUMA controls, optimize OpenMP\n",
813+
" settings, and use efficient memory allocators\n",
814+
"- **GPU optimizations**: Enable Tensor cores, use CUDA graphs, enable\n",
815+
" cuDNN autotuner, and implement mixed precision training\n",
816+
"- **Distributed optimizations**: Use DistributedDataParallel, optimize\n",
817+
" gradient synchronization, and balance workloads across devices\n",
818+
"\n",
819+
"Many of these optimizations can be applied with minimal code changes and\n",
820+
"provide significant performance improvements across a wide range of deep\n",
821+
"learning models.\n",
822+
"\n",
823+
"Further Reading\n",
824+
"===============\n",
825+
"\n",
826+
"- [PyTorch Performance Tuning\n",
827+
" Documentation](https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html)\n",
828+
"- [CUDA Best\n",
829+
" Practices](https://pytorch.org/docs/stable/notes/cuda.html)\n",
830+
"- [Distributed Training\n",
831+
" Documentation](https://pytorch.org/tutorials/intermediate/ddp_tutorial.html)\n",
832+
"- [Mixed Precision Training](https://pytorch.org/docs/stable/amp.html)\n",
833+
"- [torch.compile\n",
834+
" Tutorial](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html)\n"
835+
]
897836
}
898837
],
899838
"metadata": {

_downloads/4355e2cef7d17548f1e25f97a62828c4/template_tutorial.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
{
3232
"cell_type": "code",
3333
"execution_count": null,
34-
"id": "38143db6",
34+
"id": "516fd8da",
3535
"metadata": {},
3636
"outputs": [],
3737
"source": [
@@ -47,7 +47,7 @@
4747
},
4848
{
4949
"cell_type": "markdown",
50-
"id": "14c2d630",
50+
"id": "caf5c545",
5151
"metadata": {},
5252
"source": [
5353
"\n",

_downloads/63a0f0fc7b3ffb15d3a5ac8db3d521ee/tensors_deeper_tutorial.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@
3434
{
3535
"cell_type": "code",
3636
"execution_count": null,
37-
"id": "37ed5738",
37+
"id": "de40642c",
3838
"metadata": {},
3939
"outputs": [],
4040
"source": [
@@ -50,7 +50,7 @@
5050
},
5151
{
5252
"cell_type": "markdown",
53-
"id": "4d05cb50",
53+
"id": "4d452def",
5454
"metadata": {},
5555
"source": [
5656
"\n",

_downloads/770632dd3941d2a51b831c52ded57aa2/trainingyt.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535
{
3636
"cell_type": "code",
3737
"execution_count": null,
38-
"id": "060b2582",
38+
"id": "5b434bf1",
3939
"metadata": {},
4040
"outputs": [],
4141
"source": [
@@ -51,7 +51,7 @@
5151
},
5252
{
5353
"cell_type": "markdown",
54-
"id": "7ae9c5e3",
54+
"id": "29dc637b",
5555
"metadata": {},
5656
"source": [
5757
"\n",

0 commit comments

Comments
 (0)