Skip to content

Commit 6acfa55

Browse files
committed
Merge branch 'dcp_async_save' of github.com:pytorch/tutorials into dcp_async_save
2 parents 51a9b61 + 0a483b1 commit 6acfa55

File tree

5 files changed

+13
-9
lines changed

5 files changed

+13
-9
lines changed

advanced_source/cpp_export.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,7 @@ minimal ``CMakeLists.txt`` to build it could look as simple as:
203203
204204
add_executable(example-app example-app.cpp)
205205
target_link_libraries(example-app "${TORCH_LIBRARIES}")
206-
set_property(TARGET example-app PROPERTY CXX_STANDARD 14)
206+
set_property(TARGET example-app PROPERTY CXX_STANDARD 17)
207207
208208
The last thing we need to build the example application is the LibTorch
209209
distribution. You can always grab the latest stable release from the `download

advanced_source/super_resolution_with_onnxruntime.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
* ``torch.onnx.export`` is based on TorchScript backend and has been available since PyTorch 1.2.0.
1010
1111
In this tutorial, we describe how to convert a model defined
12-
in PyTorch into the ONNX format using the TorchScript ``torch.onnx.export` ONNX exporter.
12+
in PyTorch into the ONNX format using the TorchScript ``torch.onnx.export`` ONNX exporter.
1313
1414
The exported model will be executed with ONNX Runtime.
1515
ONNX Runtime is a performance-focused engine for ONNX models,

intermediate_source/inductor_debug_cpu.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -87,9 +87,9 @@ def neg1(x):
8787
# +-----------------------------+----------------------------------------------------------------+
8888
# | ``fx_graph_transformed.py`` | Transformed FX graph, after pattern match |
8989
# +-----------------------------+----------------------------------------------------------------+
90-
# | ``ir_post_fusion.txt`` | Inductor IR before fusion |
90+
# | ``ir_pre_fusion.txt`` | Inductor IR before fusion |
9191
# +-----------------------------+----------------------------------------------------------------+
92-
# | ``ir_pre_fusion.txt`` | Inductor IR after fusion |
92+
# | ``ir_post_fusion.txt`` | Inductor IR after fusion |
9393
# +-----------------------------+----------------------------------------------------------------+
9494
# | ``output_code.py`` | Generated Python code for graph, with C++/Triton kernels |
9595
# +-----------------------------+----------------------------------------------------------------+

recipes_source/distributed_async_checkpoint_recipe.rst

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
Asynchronous Saving with Distributed Checkpoint (DCP)
22
=====================================================
33

4+
**Author:** `Lucas Pasqualin <https://github.com/lucasllc>`__, `Iris Zhang <https://github.com/wz337>`__, `Rodrigo Kumpera <https://github.com/kumpera>`__, `Chien-Chin Huang <https://github.com/fegin>`__
5+
46
Checkpointing is often a bottle-neck in the critical path for distributed training workloads, incurring larger and larger costs as both model and world sizes grow.
57
One excellent strategy for offsetting this cost is to checkpoint in parallel, asynchronously. Below, we expand the save example
68
from the `Getting Started with Distributed Checkpoint Tutorial <https://github.com/pytorch/tutorials/blob/main/recipes_source/distributed_checkpoint_recipe.rst>`__
79
to show how this can be integrated quite easily with ``torch.distributed.checkpoint.async_save``.
810

9-
**Author**: , `Lucas Pasqualin <https://github.com/lucasllc>`__, `Iris Zhang <https://github.com/wz337>`__, `Rodrigo Kumpera <https://github.com/kumpera>`__, `Chien-Chin Huang <https://github.com/fegin>`__
1011

1112
.. grid:: 2
1213

@@ -156,9 +157,12 @@ If the above optimization is still not performant enough, you can take advantage
156157
Specifically, this optimization attacks the main overhead of asynchronous checkpointing, which is the in-memory copying to checkpointing buffers. By maintaining a pinned memory buffer between
157158
checkpoint requests users can take advantage of direct memory access to speed up this copy.
158159

159-
.. note:: The main drawback of this optimization is the persistence of the buffer in between checkpointing steps. Without the pinned memory optimization (as demonstrated above),
160-
any checkpointing buffers are released as soon as checkpointing is finished. With the pinned memory implementation, this buffer is maintained between steps, leading to the same
161-
peak memory pressure being sustained through the application life.
160+
.. note::
161+
The main drawback of this optimization is the persistence of the buffer in between checkpointing steps. Without
162+
the pinned memory optimization (as demonstrated above), any checkpointing buffers are released as soon as
163+
checkpointing is finished. With the pinned memory implementation, this buffer is maintained between steps,
164+
leading to the same
165+
peak memory pressure being sustained through the application life.
162166

163167

164168
.. code-block:: python

recipes_source/distributed_device_mesh.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -156,4 +156,4 @@ they can be used to describe the layout of devices across the cluster.
156156
For more information, please see the following:
157157

158158
- `2D parallel combining Tensor/Sequance Parallel with FSDP <https://github.com/pytorch/examples/blob/main/distributed/tensor_parallelism/fsdp_tp_example.py>`__
159-
- `Composable PyTorch Distributed with PT2 <chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://static.sched.com/hosted_files/pytorch2023/d1/%5BPTC%2023%5D%20Composable%20PyTorch%20Distributed%20with%20PT2.pdf>`__
159+
- `Composable PyTorch Distributed with PT2 <https://static.sched.com/hosted_files/pytorch2023/d1/%5BPTC%2023%5D%20Composable%20PyTorch%20Distributed%20with%20PT2.pdf>`__

0 commit comments

Comments
 (0)