@@ -264,3 +264,54 @@ $(\frac{\partial y}{\partial x_1},\cdots,\frac{\partial y}{\partial n})$
264264using a single reverse pass or $n$ forward passes. This is a situation
265265akin to derivative evaluation for a multi-input, single-output network,
266266a structure frequently encountered in machine learning.
267+
268+ Due to this feature, reverse-mode automatic differentiation forms the
269+ basis for the backpropagation algorithm, a key technique for training
270+ neural networks. By enabling efficient computation of gradients,
271+ especially in scenarios with high-dimensional input data and scalar
272+ output (common in many machine learning applications), reverse-mode
273+ automatic differentiation has become indispensable in the field.
274+
275+ However, the reverse mode does come with certain limitations. For
276+ instance, once a source program is decomposed into a sequence of
277+ elementary operations in the forward mode, inputs can be obtained
278+ synchronously during the execution of these operations. This is possible
279+ because the sequence of derivative evaluations aligns with the sequence
280+ of operation execution. In contrast, in the reverse mode, the sequence
281+ for derivative evaluation is the inverse of the execution sequence of
282+ the source program, leading to a two-phased computation process. The
283+ initial phase entails executing the source program and storing the
284+ intermediate results in memory, while the subsequent phase involves
285+ retrieving these intermediate results to evaluate the derivatives. Due
286+ to the additional steps involved, the reverse mode requires more memory.
287+
288+ ## Implementing Automatic Differentiation
289+
290+ This section explores typical design patterns for implementing automatic
291+ differentiation in machine learning frameworks. These design patterns
292+ can be broadly classified into three categories: elemental libraries,
293+ operator overloading, and source transformation.
294+
295+ ### Elemental Libraries
296+
297+ Elemental libraries encapsulate elementary expressions and their
298+ differential expressions as library functions. When coding, users must
299+ manually decompose a program into a set of elementary expressions and
300+ replace them with corresponding library functions. Take the program
301+ $a=(x+y)/z$ as an example; it needs to be manually decomposed as
302+ follows:
303+
304+ ```
305+ t = x + y
306+ a = t / z
307+ ```
308+
309+ Subsequently, users replace the decomposed elementary expressions with
310+ appropriate library functions:
311+
312+ ```
313+ // The parameters include variables x, y, and t and their derivative variables dx, dy, and dt.
314+ call ADAdd(x, dx, y, dy, t, dt)
315+ // The parameters include variables t, z, and a and their derivative variables dt, dz, and da.
316+ call ADDiv(t, dt, z, dz, a, da)
317+ ```
0 commit comments