Skip to content

Commit 8a6f90d

Browse files
author
Vincent Moens
committed
amend
1 parent 96e703a commit 8a6f90d

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

intermediate_source/pinmem_nonblock.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -487,7 +487,7 @@ def pin_copy_to_device_nonblocking(*tensors):
487487
#
488488
# Additionally, ``TensorDict.to()`` includes a ``non_blocking_pin`` option which initiates multiple threads to execute
489489
# ``pin_memory()`` before proceeding with to ``to(device)``.
490-
# This approach can further accelerate data transfers, as demonstrated in the following example:
490+
# This approach can further accelerate data transfers, as demonstrated in the following example.
491491
#
492492
# .. code-block:: bash
493493
#
@@ -536,6 +536,11 @@ def pin_copy_to_device_nonblocking(*tensors):
536536
plt.show()
537537

538538
######################################################################
539+
# In this example, we are transferring many large tensors from the CPU to the GPU.
540+
# This scenario is ideal for utilizing multithreaded ``pin_memory()``, which can significantly enhance performance.
541+
# However, if the tensors are small, the overhead associated with multithreading may outweigh the benefits.
542+
# Similarly, if there are only a few tensors, the advantages of pinning tensors on separate threads become limited.
543+
#
539544
# As an additional note, while it might seem advantageous to create permanent buffers in pinned memory to shuttle
540545
# tensors from pageable memory before transferring them to the GPU, this strategy does not necessarily expedite
541546
# computation. The inherent bottleneck caused by copying data into pinned memory remains a limiting factor.

0 commit comments

Comments
 (0)