Skip to content

Commit 0f87922

Browse files
committed
debug
1 parent 6e9a3e0 commit 0f87922

File tree

1 file changed

+51
-0
lines changed

1 file changed

+51
-0
lines changed

chapter_compiler_frontend/Automatic_Differentiation.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -264,3 +264,54 @@ $(\frac{\partial y}{\partial x_1},\cdots,\frac{\partial y}{\partial n})$
264264
using a single reverse pass or $n$ forward passes. This is a situation
265265
akin to derivative evaluation for a multi-input, single-output network,
266266
a structure frequently encountered in machine learning.
267+
268+
Due to this feature, reverse-mode automatic differentiation forms the
269+
basis for the backpropagation algorithm, a key technique for training
270+
neural networks. By enabling efficient computation of gradients,
271+
especially in scenarios with high-dimensional input data and scalar
272+
output (common in many machine learning applications), reverse-mode
273+
automatic differentiation has become indispensable in the field.
274+
275+
However, the reverse mode does come with certain limitations. For
276+
instance, once a source program is decomposed into a sequence of
277+
elementary operations in the forward mode, inputs can be obtained
278+
synchronously during the execution of these operations. This is possible
279+
because the sequence of derivative evaluations aligns with the sequence
280+
of operation execution. In contrast, in the reverse mode, the sequence
281+
for derivative evaluation is the inverse of the execution sequence of
282+
the source program, leading to a two-phased computation process. The
283+
initial phase entails executing the source program and storing the
284+
intermediate results in memory, while the subsequent phase involves
285+
retrieving these intermediate results to evaluate the derivatives. Due
286+
to the additional steps involved, the reverse mode requires more memory.
287+
288+
## Implementing Automatic Differentiation
289+
290+
This section explores typical design patterns for implementing automatic
291+
differentiation in machine learning frameworks. These design patterns
292+
can be broadly classified into three categories: elemental libraries,
293+
operator overloading, and source transformation.
294+
295+
### Elemental Libraries
296+
297+
Elemental libraries encapsulate elementary expressions and their
298+
differential expressions as library functions. When coding, users must
299+
manually decompose a program into a set of elementary expressions and
300+
replace them with corresponding library functions. Take the program
301+
$a=(x+y)/z$ as an example; it needs to be manually decomposed as
302+
follows:
303+
304+
```
305+
t = x + y
306+
a = t / z
307+
```
308+
309+
Subsequently, users replace the decomposed elementary expressions with
310+
appropriate library functions:
311+
312+
```
313+
// The parameters include variables x, y, and t and their derivative variables dx, dy, and dt.
314+
call ADAdd(x, dx, y, dy, t, dt)
315+
// The parameters include variables t, z, and a and their derivative variables dt, dz, and da.
316+
call ADDiv(t, dt, z, dz, a, da)
317+
```

0 commit comments

Comments
 (0)