Skip to content

Commit d58dca0

Browse files
committed
Rebuild
1 parent 49c3820 commit d58dca0

File tree

170 files changed

+1366
-1447
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

170 files changed

+1366
-1447
lines changed

β€Ždocs/_downloads/0f7092cb660ccb79962f314758ab0b78/pipeline_tutorial.pyβ€Ž

Lines changed: 109 additions & 117 deletions
Large diffs are not rendered by default.

β€Ždocs/_downloads/557779fe1d2bfa1a29f7f8ccca39884d/profiler.ipynbβ€Ž

Lines changed: 10 additions & 10 deletions
Large diffs are not rendered by default.

β€Ždocs/_downloads/71ad2fed5ce61e4323d32ba38b8594a7/profiler.pyβ€Ž

Lines changed: 64 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,24 @@
11
"""
2-
Profiling your PyTorch Module
3-
------------
2+
PyTorch λͺ¨λ“ˆ ν”„λ‘œνŒŒμΌλ§ ν•˜κΈ°
3+
---------------------------
44
**Author:** `Suraj Subramanian <https://github.com/suraj813>`_
55
6-
PyTorch includes a profiler API that is useful to identify the time and
7-
memory costs of various PyTorch operations in your code. Profiler can be
8-
easily integrated in your code, and the results can be printed as a table
9-
or retured in a JSON trace file.
6+
**λ²ˆμ—­:** `이재볡 <http://github.com/zzaebok>`_
7+
8+
PyTorchλŠ” μ½”λ“œ λ‚΄μ˜ λ‹€μ–‘ν•œ Pytorch 연산에 λŒ€ν•œ μ‹œκ°„κ³Ό λ©”λͺ¨λ¦¬ λΉ„μš©μ„ νŒŒμ•…ν•˜λŠ” 데 μœ μš©ν•œ ν”„λ‘œνŒŒμΌλŸ¬(profiler) APIλ₯Ό ν¬ν•¨ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.
9+
ν”„λ‘œνŒŒμΌλŸ¬λŠ” μ½”λ“œμ— μ‰½κ²Œ 톡합될 수 있으며, ν”„λ‘œνŒŒμΌλ§ κ²°κ³ΌλŠ” ν‘œλ‘œ 좜λ ₯λ˜κ±°λ‚˜ JSON ν˜•μ‹μ˜ 좔적(trace) 파일둜 λ°˜ν™˜λ  수 μžˆμŠ΅λ‹ˆλ‹€.
1010
1111
.. note::
12-
Profiler supports multithreaded models. Profiler runs in the
13-
same thread as the operation but it will also profile child operators
14-
that might run in another thread. Concurrently-running profilers will be
15-
scoped to their own thread to prevent mixing of results.
12+
ν”„λ‘œνŒŒμΌλŸ¬λŠ” λ©€ν‹°μŠ€λ ˆλ“œν™”λœ λͺ¨λΈλ“€μ„ μ§€μ›ν•©λ‹ˆλ‹€.
13+
ν”„λ‘œνŒŒμΌλŸ¬λŠ” 연산이 μ΄λ£¨μ–΄μ§€λŠ” μŠ€λ ˆλ“œμ™€ 같은 μŠ€λ ˆλ“œμ—μ„œ μ‹€ν–‰λ˜μ§€λ§Œ λ‹€λ₯Έ μŠ€λ ˆλ“œμ—μ„œ μ‹€ν–‰λ˜λŠ” μžμ‹ μ—°μ‚°
14+
λ˜ν•œ ν”„λ‘œνŒŒμΌλ§ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
15+
λ™μ‹œμ— μ‹€ν–‰λ˜λŠ” ν”„λ‘œνŒŒμΌλŸ¬λ“€μ€ κ²°κ³Όκ°€ μ„žμ΄μ§€ μ•Šλ„λ‘ 각자의 μŠ€λ ˆλ“œ λ²”μœ„μ— ν•œμ •λ©λ‹ˆλ‹€.
1616
1717
.. note::
18-
PyTorch 1.8 introduces the new API that will replace the older profiler API
19-
in the future releases. Check the new API at `this page <https://pytorch.org/docs/master/profiler.html>`__.
18+
Pytorch 1.8은 미래의 λ¦΄λ¦¬μ¦ˆμ—μ„œ 기쑴의 ν”„λ‘œνŒŒμΌλŸ¬ APIλ₯Ό λŒ€μ²΄ν•  μƒˆλ‘œμš΄ APIλ₯Ό μ†Œκ°œν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.
19+
μƒˆλ‘œμš΄ APIλ₯Ό `이 νŽ˜μ΄μ§€ <https://pytorch.org/docs/master/profiler.html>`__ μ—μ„œ ν™•μΈν•˜μ„Έμš”.
2020
21-
Head on over to `this
22-
recipe <https://tutorials.pytorch.kr/recipes/recipes/profiler_recipe.html>`__
23-
for a quicker walkthrough of Profiler API usage.
21+
ν”„λ‘œνŒŒμΌλŸ¬ API μ‚¬μš©λ²•μ— λŒ€ν•΄ λΉ λ₯΄κ²Œ μ‚΄νŽ΄λ³΄κ³  μ‹Άλ‹€λ©΄ `이 λ ˆμ‹œν”Ό λ¬Έμ„œ <https://tutorials.pytorch.kr/recipes/recipes/profiler_recipe.html>`__ λ₯Ό ν™•μΈν•˜μ„Έμš”.
2422
2523
2624
--------------
@@ -33,24 +31,22 @@
3331

3432

3533
######################################################################
36-
# Performance debugging using Profiler
34+
# ν”„λ‘œνŒŒμΌλŸ¬λ₯Ό μ΄μš©ν•˜μ—¬ μ„±λŠ₯ λ””λ²„κΉ…ν•˜κΈ°
3735
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3836
#
39-
# Profiler can be useful to identify performance bottlenecks in your
40-
# models. In this example, we build a custom module that performs two
41-
# sub-tasks:
37+
# ν”„λ‘œνŒŒμΌλŸ¬λŠ” λͺ¨λΈμ—μ„œ μ„±λŠ₯의 병λͺ©μ„ νŒŒμ•…ν•  λ•Œ μœ μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
38+
# 이번 μ˜ˆμ œμ—μ„œ, 두 κ°€μ§€ ν•˜μœ„ μž‘μ—…μ„ μˆ˜ν–‰ν•˜λŠ” μ‚¬μš©μž μ •μ˜ λͺ¨λ“ˆμ„ λ§Œλ“€κ² μŠ΅λ‹ˆλ‹€:
4239
#
43-
# - a linear transformation on the input, and
44-
# - use the transformation result to get indices on a mask tensor.
40+
# - μž…λ ₯에 λŒ€ν•œ μ„ ν˜• λ³€ν™˜
41+
# - λ³€ν™˜ κ²°κ³Όλ₯Ό μ΄μš©ν•œ 마슀크 ν…μ„œ(mask Tensor)μ—μ„œ 인덱슀 μΆ”μΆœ
4542
#
46-
# We wrap the code for each sub-task in separate labelled context managers using
47-
# ``profiler.record_function("label")``. In the profiler output, the
48-
# aggregate performance metrics of all operations in the sub-task will
49-
# show up under its corresponding label.
43+
# 각 ν•˜μœ„ μž‘μ—…λ“€μ— λŒ€ν•œ μ½”λ“œλŠ” ``profiler.record_function("label")`` 을 μ΄μš©ν•˜μ—¬
44+
# λ ˆμ΄λΈ”λœ μ»¨ν…μŠ€νŠΈ λ§€λ‹ˆμ €(context manager) 듀에 μ˜ν•΄ κ°μŒ‰λ‹ˆλ‹€.
45+
# ν”„λ‘œνŒŒμΌλŸ¬μ˜ 좜λ ₯μ—μ„œ, ν•˜μœ„ μž‘μ—…λ“€μ˜ λͺ¨λ“  연산에 λŒ€ν•œ 집계(aggregate) μ„±λŠ₯ μ§€ν‘œλ“€μ΄ ν•΄λ‹Ή λ ˆμ΄λΈ” μ•„λž˜ λ‚˜νƒ€λ‚˜κ²Œ λ©λ‹ˆλ‹€.
5046
#
5147
#
52-
# Note that using Profiler incurs some overhead, and is best used only for investigating
53-
# code. Remember to remove it if you are benchmarking runtimes.
48+
# ν”„λ‘œνŒŒμΌλŸ¬λ₯Ό μ‚¬μš©ν•˜λŠ” 것은 μ•½κ°„μ˜ μ˜€λ²„ν—€λ“œκ°€ λ°œμƒν•˜λ©°, μ½”λ“œλ₯Ό 뢄석할 λ•Œμ—λ§Œ μ‚¬μš©ν•˜λŠ” 것이 κ°€μž₯ μ’‹μŠ΅λ‹ˆλ‹€.
49+
# 만일 μ‹€ν–‰μ‹œκ°„μ„ λ²€μΉ˜λ§ˆν‚Ήν•˜λŠ” κ²½μš°μ—λŠ” 이λ₯Ό μ œκ±°ν•˜λŠ” 것을 μžŠμ§€ λ§ˆμ‹­μ‹œμ˜€.
5450
#
5551

5652
class MyModule(nn.Module):
@@ -71,52 +67,49 @@ def forward(self, input, mask):
7167

7268

7369
######################################################################
74-
# Profile the forward pass
75-
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
70+
# μˆœμ „νŒŒ 단계(forward pass) ν”„λ‘œνŒŒμΌλ§ν•˜κΈ°
71+
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7672
#
77-
# We initialize random input and mask tensors, and the model.
73+
# μž…λ ₯κ³Ό 마슀크 ν…μ„œ, 그리고 λͺ¨λΈμ„ μž„μ˜λ‘œ μ΄ˆκΈ°ν™”ν•©λ‹ˆλ‹€.
7874
#
79-
# Before we run the profiler, we warm-up CUDA to ensure accurate
80-
# performance benchmarking. We wrap the forward pass of our module in the
81-
# ``profiler.profile`` context manager. The ``with_stack=True`` parameter appends the
82-
# file and line number of the operation in the trace.
75+
# ν”„λ‘œνŒŒμΌλŸ¬λ₯Ό μ‹€ν–‰ν•˜κΈ° μ „, μ •ν™•ν•œ μ„±λŠ₯ λ²€μΉ˜λ§ˆν‚Ήμ„ 보μž₯ν•˜κΈ° μœ„ν•΄ CUDAλ₯Ό μ›Œλ°μ—…(warm-up) μ‹œν‚΅λ‹ˆλ‹€.
76+
# λͺ¨λΈμ˜ μˆœμ „νŒŒ 단계λ₯Ό ``profiler.profile`` μ»¨ν…μŠ€νŠΈ λ§€λ‹ˆμ €λ₯Ό 톡해 κ°μŒ‰λ‹ˆλ‹€.
77+
# ``with_stack=True`` μΈμžλŠ” μ—°μ‚°μ˜ 좔적(trace) 파일 내뢀에 파일과 μ€„λ²ˆν˜Έλ₯Ό λ§λΆ™μž…λ‹ˆλ‹€.
8378
#
8479
# .. WARNING::
85-
# ``with_stack=True`` incurs an additional overhead, and is better suited for investigating code.
86-
# Remember to remove it if you are benchmarking performance.
80+
# ``with_stack=True`` λŠ” 좔가적인 μ˜€λ²„ν—€λ“œλ₯Ό λ°œμƒμ‹œν‚€κΈ° λ•Œλ¬Έμ— μ½”λ“œλ₯Ό 뢄석할 λ•Œμ— μ‚¬μš©ν•˜λŠ” 것이 λ°”λžŒμ§ν•©λ‹ˆλ‹€.
81+
# μ„±λŠ₯을 λ²€μΉ˜λ§ˆν‚Ήν•œλ‹€λ©΄ 이λ₯Ό μ œκ±°ν•˜λŠ” 것을 μžŠμ§€ λ§ˆμ‹­μ‹œμ˜€.
8782
#
8883

8984
model = MyModule(500, 10).cuda()
9085
input = torch.rand(128, 500).cuda()
9186
mask = torch.rand((500, 500, 500), dtype=torch.double).cuda()
9287

93-
# warm-up
88+
# μ›Œλ°μ—…(warm-up)
9489
model(input, mask)
9590

9691
with profiler.profile(with_stack=True, profile_memory=True) as prof:
9792
out, idx = model(input, mask)
9893

9994

10095
######################################################################
101-
# Print profiler results
96+
# ν”„λ‘œνŒŒμΌλŸ¬μ˜ κ²°κ³Ό 좜λ ₯ν•˜κΈ°
10297
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
10398
#
104-
# Finally, we print the profiler results. ``profiler.key_averages``
105-
# aggregates the results by operator name, and optionally by input
106-
# shapes and/or stack trace events.
107-
# Grouping by input shapes is useful to identify which tensor shapes
108-
# are utilized by the model.
99+
# μ΅œμ’…μ μœΌλ‘œ ν”„λ‘œνŒŒμΌλŸ¬μ˜ κ²°κ³Όλ₯Ό 좜λ ₯ν•©λ‹ˆλ‹€.
100+
# ``profiler.key_averages`` λŠ” μ—°μ‚°μžμ˜ 이름에 따라 κ²°κ³Όλ₯Ό μ§‘κ³„ν•˜λŠ”λ°,
101+
# μ„ νƒμ μœΌλ‘œ μž…λ ₯의 shapeκ³Ό/λ˜λŠ” μŠ€νƒ 좔적(stack trace) μ΄λ²€νŠΈμ— λ”°λΌμ„œλ„ κ²°κ³Όλ₯Ό 집계할 수 μžˆμŠ΅λ‹ˆλ‹€.
102+
# μž…λ ₯의 shape에 λ”°λΌμ„œ κ·Έλ£Ήν™” ν•˜λŠ” 것은 μ–΄λ– ν•œ shape의 ν…μ„œλ“€μ΄ λͺ¨λΈμ— μ˜ν•΄ μ‚¬μš©λ˜λŠ”μ§€ νŒŒμ•…ν•˜λŠ” 데 μœ μš©ν•©λ‹ˆλ‹€.
109103
#
110-
# Here, we use ``group_by_stack_n=5`` which aggregates runtimes by the
111-
# operation and its traceback (truncated to the most recent 5 events), and
112-
# display the events in the order they are registered. The table can also
113-
# be sorted by passing a ``sort_by`` argument (refer to the
114-
# `docs <https://pytorch.org/docs/stable/autograd.html#profiler>`__ for
115-
# valid sorting keys).
104+
# μ—¬κΈ°μ„œ, ``group_by_stack_n=5`` λ₯Ό μ‚¬μš©ν•˜λŠ”λ° μ΄λŠ” μ—°μ‚°(operation)κ³Ό traceback(κ°€μž₯ 졜근 5개의 μ΄λ²€νŠΈμ— λŒ€ν•œ)을
105+
# κΈ°μ€€μœΌλ‘œ μ‹€ν–‰μ‹œκ°„μ„ μ§‘κ³„ν•˜λŠ” 것이고, μ΄λ²€νŠΈλ“€μ΄ λ“±λ‘λœ μˆœμ„œλ‘œ μ •λ ¬λ˜μ–΄ ν‘œμ‹œλ©λ‹ˆλ‹€.
106+
# κ²°κ³Ό ν‘œλŠ” ``sort_by`` 인자 (μœ νš¨ν•œ μ •λ ¬ ν‚€λŠ” `docs <https://pytorch.org/docs/stable/autograd.html#profiler>`__ μ—μ„œ
107+
# ν™•μΈν•˜μ„Έμš”) λ₯Ό λ„˜κ²¨μ€ŒμœΌλ‘œμ¨ 정렬될 수 μžˆμŠ΅λ‹ˆλ‹€.
116108
#
117109
# .. Note::
118-
# When running profiler in a notebook, you might see entries like ``<ipython-input-18-193a910735e8>(13): forward``
119-
# instead of filenames in the stacktrace. These correspond to ``<notebook-cell>(line number): calling-function``.
110+
# notebookμ—μ„œ ν”„λ‘œνŒŒμΌλŸ¬λ₯Ό μ‹€ν–‰ν•  λ•Œ μŠ€νƒ 좔적(stacktrace)μ—μ„œ 파일λͺ… λŒ€μ‹ 
111+
# ``<ipython-input-18-193a910735e8>(13): forward`` 와 같은 ν•­λͺ©μ„ λ³Ό 수 μžˆμŠ΅λ‹ˆλ‹€.
112+
# μ΄λŠ” ``<notebook-cell>(line number): calling-function`` 의 ν˜•μ‹μ— λŒ€μ‘λ©λ‹ˆλ‹€.
120113

121114
print(prof.key_averages(group_by_stack_n=5).table(sort_by='self_cpu_time_total', row_limit=5))
122115

@@ -162,21 +155,21 @@ def forward(self, input, mask):
162155
"""
163156

164157
######################################################################
165-
# Improve memory performance
158+
# λ©”λͺ¨λ¦¬ μ„±λŠ₯ ν–₯μƒμ‹œν‚€κΈ°
166159
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
167-
# Note that the most expensive operations - in terms of memory and time -
168-
# are at ``forward (10)`` representing the operations within MASK INDICES. Let’s try to
169-
# tackle the memory consumption first. We can see that the ``.to()``
170-
# operation at line 12 consumes 953.67 Mb. This operation copies ``mask`` to the CPU.
171-
# ``mask`` is initialized with a ``torch.double`` datatype. Can we reduce the memory footprint by casting
172-
# it to ``torch.float`` instead?
160+
# λ©”λͺ¨λ¦¬μ™€ μ‹œκ°„ μΈ‘λ©΄μ—μ„œ κ°€μž₯ λΉ„μš©μ΄ 큰 연산은 MASK INDICES λ‚΄ ``forward(10)`` μ—°μ‚°μž…λ‹ˆλ‹€.
161+
# λ¨Όμ € λ©”λͺ¨λ¦¬ μ†Œλͺ¨ 문제λ₯Ό ν•΄κ²°ν•΄λ΄…μ‹œλ‹€.
162+
# 12번째 μ€„μ˜ ``.to()`` 연산은 953.67 Mbλ₯Ό μ†Œλͺ¨ν•˜λŠ” 것을 확인할 수 μžˆμŠ΅λ‹ˆλ‹€.
163+
# 이 연산은 ``mask`` λ₯Ό CPU에 λ³΅μ‚¬ν•©λ‹ˆλ‹€.
164+
# ``mask`` λŠ” ``torch.double`` 데이터 νƒ€μž…μœΌλ‘œ μ΄ˆκΈ°ν™”λ©λ‹ˆλ‹€.
165+
# 이λ₯Ό ``torch.float`` 으둜 λ³€ν™˜ν•˜μ—¬ λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰μ„ 쀄일 수 μžˆμ„κΉŒμš”?
173166
#
174167

175168
model = MyModule(500, 10).cuda()
176169
input = torch.rand(128, 500).cuda()
177170
mask = torch.rand((500, 500, 500), dtype=torch.float).cuda()
178171

179-
# warm-up
172+
# μ›Œλ°μ—…(warm-up)
180173
model(input, mask)
181174

182175
with profiler.profile(with_stack=True, profile_memory=True) as prof:
@@ -227,16 +220,15 @@ def forward(self, input, mask):
227220

228221
######################################################################
229222
#
230-
# The CPU memory footprint for this operation has halved.
223+
# 이 연산을 μœ„ν•œ CPU λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰μ΄ 절반으둜 μ€„μ—ˆμŠ΅λ‹ˆλ‹€.
231224
#
232-
# Improve time performance
225+
# μ‹œκ°„ μ„±λŠ₯ ν–₯μƒμ‹œν‚€κΈ°
233226
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
234-
# While the time consumed has also reduced a bit, it’s still too high.
235-
# Turns out copying a matrix from CUDA to CPU is pretty expensive!
236-
# The ``aten::copy_`` operator in ``forward (12)`` copies ``mask`` to CPU
237-
# so that it can use the NumPy ``argwhere`` function. ``aten::copy_`` at ``forward(13)``
238-
# copies the array back to CUDA as a tensor. We could eliminate both of these if we use a
239-
# ``torch`` function ``nonzero()`` here instead.
227+
# μ†Œλͺ¨λœ μ‹œκ°„μ΄ 쑰금 쀄긴 ν–ˆμ§€λ§Œ, μ΄λŠ” 아직도 λ„ˆλ¬΄ 높은 μˆ˜μΉ˜μž…λ‹ˆλ‹€.
228+
# CUDA μ—μ„œ CPU 둜 행렬을 λ³΅μ‚¬ν•˜λŠ” 것이 κ½€ λΉ„μš©μ΄ 큰 연산인 것이 λ°ν˜€μ‘ŒμŠ΅λ‹ˆλ‹€.
229+
# ``forward(12)`` 의 ``aten::copy_`` 연산은 ``mask`` λ₯Ό CPU에 λ³΅μ‚¬ν•˜μ—¬ NumPy 의 ``argwhere`` ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•  수 있게 ν•©λ‹ˆλ‹€.
230+
# ``forward(13)`` 의 ``aten::copy_`` λŠ” 배열을 λ‹€μ‹œ ν…μ„œλ‘œ CUDA에 λ³΅μ‚¬ν•©λ‹ˆλ‹€.
231+
# μ΄κ³³μ—μ„œ ``torch`` ν•¨μˆ˜ ``nonzero()`` λ₯Ό λŒ€μ‹  μ‚¬μš©ν•œλ‹€λ©΄ 두 연산을 λͺ¨λ‘ μ œκ±°ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
240232
#
241233

242234
class MyModule(nn.Module):
@@ -259,7 +251,7 @@ def forward(self, input, mask):
259251
input = torch.rand(128, 500).cuda()
260252
mask = torch.rand((500, 500, 500), dtype=torch.float).cuda()
261253

262-
# warm-up
254+
# μ›Œλ°μ—…(warm-up)
263255
model(input, mask)
264256

265257
with profiler.profile(with_stack=True, profile_memory=True) as prof:
@@ -310,11 +302,11 @@ def forward(self, input, mask):
310302

311303

312304
######################################################################
313-
# Further Reading
305+
# 더 읽을거리
314306
# ~~~~~~~~~~~~~~~~~
315-
# We have seen how Profiler can be used to investigate time and memory bottlenecks in PyTorch models.
316-
# Read more about Profiler here:
307+
# PyTorch λͺ¨λΈμ—μ„œ μ‹œκ°„κ³Ό λ©”λͺ¨λ¦¬ 병λͺ©μ„ λΆ„μ„ν•˜κΈ° μœ„ν•΄ ν”„λ‘œνŒŒμΌλŸ¬κ°€ μ–΄λ–»κ²Œ μ‚¬μš©λ  수 μžˆλŠ”μ§€λ₯Ό μ‚΄νŽ΄λ³΄μ•˜μŠ΅λ‹ˆλ‹€.
308+
# μ•„λž˜μ— ν”„λ‘œνŒŒμΌλŸ¬μ— λŒ€ν•œ 읽을거리가 더 μžˆμŠ΅λ‹ˆλ‹€:
317309
#
318-
# - `Profiler Usage Recipe <https://tutorials.pytorch.kr/recipes/recipes/profiler.html>`__
310+
# - `ν”„λ‘œνŒŒμΌλŸ¬ μ‚¬μš© λ ˆμ‹œν”Ό <https://tutorials.pytorch.kr/recipes/recipes/profiler_recipe.html>`__
319311
# - `Profiling RPC-Based Workloads <https://tutorials.pytorch.kr/recipes/distributed_rpc_profiling.html>`__
320312
# - `Profiler API Docs <https://pytorch.org/docs/stable/autograd.html?highlight=profiler#profiler>`__

0 commit comments

Comments
Β (0)