PyTorchKorea
diff --git a/‎docs/_downloads/0f7092cb660ccb79962f314758ab0b78/pipeline_tutorial.py‎
Lines changed: 109 additions & 117 deletions b/‎docs/_downloads/0f7092cb660ccb79962f314758ab0b78/pipeline_tutorial.py‎
Lines changed: 109 additions & 117 deletions
diff --git a/‎docs/_downloads/557779fe1d2bfa1a29f7f8ccca39884d/profiler.ipynb‎
Lines changed: 10 additions & 10 deletions b/‎docs/_downloads/557779fe1d2bfa1a29f7f8ccca39884d/profiler.ipynb‎
Lines changed: 10 additions & 10 deletions
diff --git a/‎docs/_downloads/71ad2fed5ce61e4323d32ba38b8594a7/profiler.py‎
Lines changed: 64 additions & 72 deletions b/‎docs/_downloads/71ad2fed5ce61e4323d32ba38b8594a7/profiler.py‎
Lines changed: 64 additions & 72 deletions
@@ -1,26 +1,24 @@
 """
-Profiling your PyTorch Module
-------------
+PyTorch 모듈 프로파일링 하기
+---------------------------
 **Author:** `Suraj Subramanian <https://github.com/suraj813>`_
 
-PyTorch includes a profiler API that is useful to identify the time and
-memory costs of various PyTorch operations in your code. Profiler can be
-easily integrated in your code, and the results can be printed as a table
-or retured in a JSON trace file.
+**번역:** `이재복 <http://github.com/zzaebok>`_
+
+PyTorch는 코드 내의 다양한 Pytorch 연산에 대한 시간과 메모리 비용을 파악하는 데 유용한 프로파일러(profiler) API를 포함하고 있습니다.
+프로파일러는 코드에 쉽게 통합될 수 있으며, 프로파일링 결과는 표로 출력되거나 JSON 형식의 추적(trace) 파일로 반환될 수 있습니다.
 
 .. note::
-    Profiler supports multithreaded models. Profiler runs in the
-    same thread as the operation but it will also profile child operators
-    that might run in another thread. Concurrently-running profilers will be
-    scoped to their own thread to prevent mixing of results.
+    프로파일러는 멀티스레드화된 모델들을 지원합니다.
+    프로파일러는 연산이 이루어지는 스레드와 같은 스레드에서 실행되지만 다른 스레드에서 실행되는 자식 연산
+    또한 프로파일링할 수 있습니다.
+    동시에 실행되는 프로파일러들은 결과가 섞이지 않도록 각자의 스레드 범위에 한정됩니다.
 
 .. note::
-    PyTorch 1.8 introduces the new API that will replace the older profiler API
-    in the future releases. Check the new API at `this page <https://pytorch.org/docs/master/profiler.html>`__.
+    Pytorch 1.8은 미래의 릴리즈에서 기존의 프로파일러 API를 대체할 새로운 API를 소개하고 있습니다.
+    새로운 API를 `이 페이지 <https://pytorch.org/docs/master/profiler.html>`__ 에서 확인하세요.
 
-Head on over to `this
-recipe <https://tutorials.pytorch.kr/recipes/recipes/profiler_recipe.html>`__
-for a quicker walkthrough of Profiler API usage.
+프로파일러 API 사용법에 대해 빠르게 살펴보고 싶다면 `이 레시피 문서 <https://tutorials.pytorch.kr/recipes/recipes/profiler_recipe.html>`__ 를 확인하세요.
 
 
 --------------
@@ -33,24 +31,22 @@
 
 
 ######################################################################
-# Performance debugging using Profiler
+# 프로파일러를 이용하여 성능 디버깅하기
 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 #
-# Profiler can be useful to identify performance bottlenecks in your
-# models. In this example, we build a custom module that performs two
-# sub-tasks:
+# 프로파일러는 모델에서 성능의 병목을 파악할 때 유용할 수 있습니다.
+# 이번 예제에서, 두 가지 하위 작업을 수행하는 사용자 정의 모듈을 만들겠습니다:
 #
-# - a linear transformation on the input, and
-# - use the transformation result to get indices on a mask tensor.
+# - 입력에 대한 선형 변환
+# - 변환 결과를 이용한 마스크 텐서(mask Tensor)에서 인덱스 추출
 #
-# We wrap the code for each sub-task in separate labelled context managers using
-# ``profiler.record_function("label")``. In the profiler output, the
-# aggregate performance metrics of all operations in the sub-task will
-# show up under its corresponding label.
+# 각 하위 작업들에 대한 코드는 ``profiler.record_function("label")`` 을 이용하여
+# 레이블된 컨텍스트 매니저(context manager) 들에 의해 감쌉니다.
+# 프로파일러의 출력에서, 하위 작업들의 모든 연산에 대한 집계(aggregate) 성능 지표들이 해당 레이블 아래 나타나게 됩니다.
 #
 #
-# Note that using Profiler incurs some overhead, and is best used only for investigating
-# code. Remember to remove it if you are benchmarking runtimes.
+# 프로파일러를 사용하는 것은 약간의 오버헤드가 발생하며, 코드를 분석할 때에만 사용하는 것이 가장 좋습니다.
+# 만일 실행시간을 벤치마킹하는 경우에는 이를 제거하는 것을 잊지 마십시오.
 #
 
 class MyModule(nn.Module):
@@ -71,52 +67,49 @@ def forward(self, input, mask):
 
 
 ######################################################################
-# Profile the forward pass
-# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+# 순전파 단계(forward pass) 프로파일링하기
+# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 #
-# We initialize random input and mask tensors, and the model.
+# 입력과 마스크 텐서, 그리고 모델을 임의로 초기화합니다.
 #
-# Before we run the profiler, we warm-up CUDA to ensure accurate
-# performance benchmarking. We wrap the forward pass of our module in the
-# ``profiler.profile`` context manager. The ``with_stack=True`` parameter appends the
-# file and line number of the operation in the trace.
+# 프로파일러를 실행하기 전, 정확한 성능 벤치마킹을 보장하기 위해 CUDA를 워밍업(warm-up) 시킵니다.
+# 모델의 순전파 단계를 ``profiler.profile`` 컨텍스트 매니저를 통해 감쌉니다.
+# ``with_stack=True`` 인자는 연산의 추적(trace) 파일 내부에 파일과 줄번호를 덧붙입니다.
 #
 # .. WARNING::
-#     ``with_stack=True`` incurs an additional overhead, and is better suited for investigating code.
-#     Remember to remove it if you are benchmarking performance.
+#     ``with_stack=True`` 는 추가적인 오버헤드를 발생시키기 때문에 코드를 분석할 때에 사용하는 것이 바람직합니다.
+#     성능을 벤치마킹한다면 이를 제거하는 것을 잊지 마십시오.
 #
 
 model = MyModule(500, 10).cuda()
 input = torch.rand(128, 500).cuda()
 mask = torch.rand((500, 500, 500), dtype=torch.double).cuda()
 
-# warm-up
+# 워밍업(warm-up)
 model(input, mask)
 
 with profiler.profile(with_stack=True, profile_memory=True) as prof:
     out, idx = model(input, mask)
 
 
 ######################################################################
-# Print profiler results
+# 프로파일러의 결과 출력하기
 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 #
-# Finally, we print the profiler results. ``profiler.key_averages``
-# aggregates the results by operator name, and optionally by input
-# shapes and/or stack trace events.
-# Grouping by input shapes is useful to identify which tensor shapes
-# are utilized by the model.
+# 최종적으로 프로파일러의 결과를 출력합니다.
+# ``profiler.key_averages`` 는 연산자의 이름에 따라 결과를 집계하는데,
+# 선택적으로 입력의 shape과/또는 스택 추적(stack trace) 이벤트에 따라서도 결과를 집계할 수 있습니다.
+# 입력의 shape에 따라서 그룹화 하는 것은 어떠한 shape의 텐서들이 모델에 의해 사용되는지 파악하는 데 유용합니다.
 #
-# Here, we use ``group_by_stack_n=5`` which aggregates runtimes by the
-# operation and its traceback (truncated to the most recent 5 events), and
-# display the events in the order they are registered. The table can also
-# be sorted by passing a ``sort_by`` argument (refer to the
-# `docs <https://pytorch.org/docs/stable/autograd.html#profiler>`__ for
-# valid sorting keys).
+# 여기서, ``group_by_stack_n=5`` 를 사용하는데 이는 연산(operation)과 traceback(가장 최근 5개의 이벤트에 대한)을
+# 기준으로 실행시간을 집계하는 것이고, 이벤트들이 등록된 순서로 정렬되어 표시됩니다.
+# 결과 표는 ``sort_by`` 인자 (유효한 정렬 키는 `docs <https://pytorch.org/docs/stable/autograd.html#profiler>`__ 에서
+# 확인하세요) 를 넘겨줌으로써 정렬될 수 있습니다.
 #
 # .. Note::
-#   When running profiler in a notebook, you might see entries like ``<ipython-input-18-193a910735e8>(13): forward``
-#   instead of filenames in the stacktrace. These correspond to ``<notebook-cell>(line number): calling-function``.
+#   notebook에서 프로파일러를 실행할 때 스택 추적(stacktrace)에서 파일명 대신
+#   ``<ipython-input-18-193a910735e8>(13): forward`` 와 같은 항목을 볼 수 있습니다.
+#   이는 ``<notebook-cell>(line number): calling-function`` 의 형식에 대응됩니다.
 
 print(prof.key_averages(group_by_stack_n=5).table(sort_by='self_cpu_time_total', row_limit=5))
 
@@ -162,21 +155,21 @@ def forward(self, input, mask):
 """
 
 ######################################################################
-# Improve memory performance
+# 메모리 성능 향상시키기
 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-# Note that the most expensive operations - in terms of memory and time -
-# are at ``forward (10)`` representing the operations within MASK INDICES. Let’s try to
-# tackle the memory consumption first. We can see that the ``.to()``
-# operation at line 12 consumes 953.67 Mb. This operation copies ``mask`` to the CPU.
-# ``mask`` is initialized with a ``torch.double`` datatype. Can we reduce the memory footprint by casting
-# it to ``torch.float`` instead?
+# 메모리와 시간 측면에서 가장 비용이 큰 연산은 MASK INDICES 내 ``forward(10)`` 연산입니다.
+# 먼저 메모리 소모 문제를 해결해봅시다.
+# 12번째 줄의 ``.to()`` 연산은 953.67 Mb를 소모하는 것을 확인할 수 있습니다.
+# 이 연산은 ``mask`` 를 CPU에 복사합니다.
+# ``mask`` 는 ``torch.double`` 데이터 타입으로 초기화됩니다.
+# 이를 ``torch.float`` 으로 변환하여 메모리 사용량을 줄일 수 있을까요?
 #
 
 model = MyModule(500, 10).cuda()
 input = torch.rand(128, 500).cuda()
 mask = torch.rand((500, 500, 500), dtype=torch.float).cuda()
 
-# warm-up
+# 워밍업(warm-up)
 model(input, mask)
 
 with profiler.profile(with_stack=True, profile_memory=True) as prof:
@@ -227,16 +220,15 @@ def forward(self, input, mask):
 
 ######################################################################
 #
-# The CPU memory footprint for this operation has halved.
+# 이 연산을 위한 CPU 메모리 사용량이 절반으로 줄었습니다.
 #
-# Improve time performance
+# 시간 성능 향상시키기
 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-# While the time consumed has also reduced a bit, it’s still too high.
-# Turns out copying a matrix from CUDA to CPU is pretty expensive!
-# The ``aten::copy_`` operator in ``forward (12)`` copies ``mask`` to CPU
-# so that it can use the NumPy ``argwhere`` function. ``aten::copy_`` at ``forward(13)``
-# copies the array back to CUDA as a tensor. We could eliminate both of these if we use a
-# ``torch`` function ``nonzero()`` here instead.
+# 소모된 시간이 조금 줄긴 했지만, 이는 아직도 너무 높은 수치입니다.
+# CUDA 에서 CPU 로 행렬을 복사하는 것이 꽤 비용이 큰 연산인 것이 밝혀졌습니다.
+# ``forward(12)`` 의 ``aten::copy_`` 연산은 ``mask`` 를 CPU에 복사하여 NumPy 의 ``argwhere`` 함수를 사용할 수 있게 합니다.
+# ``forward(13)`` 의 ``aten::copy_`` 는 배열을 다시 텐서로 CUDA에 복사합니다.
+# 이곳에서 ``torch`` 함수 ``nonzero()`` 를 대신 사용한다면 두 연산을 모두 제거할 수 있습니다.
 #
 
 class MyModule(nn.Module):
@@ -259,7 +251,7 @@ def forward(self, input, mask):
 input = torch.rand(128, 500).cuda()
 mask = torch.rand((500, 500, 500), dtype=torch.float).cuda()
 
-# warm-up
+# 워밍업(warm-up)
 model(input, mask)
 
 with profiler.profile(with_stack=True, profile_memory=True) as prof:
@@ -310,11 +302,11 @@ def forward(self, input, mask):
 
 
 ######################################################################
-# Further Reading
+# 더 읽을거리
 # ~~~~~~~~~~~~~~~~~
-# We have seen how Profiler can be used to investigate time and memory bottlenecks in PyTorch models.
-# Read more about Profiler here:
+# PyTorch 모델에서 시간과 메모리 병목을 분석하기 위해 프로파일러가 어떻게 사용될 수 있는지를 살펴보았습니다.
+# 아래에 프로파일러에 대한 읽을거리가 더 있습니다:
 #
-# - `Profiler Usage Recipe <https://tutorials.pytorch.kr/recipes/recipes/profiler.html>`__
+# - `프로파일러 사용 레시피 <https://tutorials.pytorch.kr/recipes/recipes/profiler_recipe.html>`__
 # - `Profiling RPC-Based Workloads <https://tutorials.pytorch.kr/recipes/distributed_rpc_profiling.html>`__
 # - `Profiler API Docs <https://pytorch.org/docs/stable/autograd.html?highlight=profiler#profiler>`__