Skip to content

Commit 434d132

Browse files
committed
Fix minor typos
1 parent d333862 commit 434d132

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

docs/src/theory.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,10 @@ i.e., the function value and the directional derivative up to order $p$.
1414
This notation might be unfamiliar to Julia users that had experience with other AD packages, but $\partial f(x)$ is simply the jacobian $J$, and $\partial f(x)\times v$ is simply the Jacobian-vector product (JVP).
1515
In other words, this is a simple generalization of Jacobian-vector product to Hessian-vector-vector product, and to even higher orders.
1616

17-
The main advantage of doing this instead of doing $p$ first-order Jacobian-vector products is that nesting first-order AD results in exponential scaling w.r.t $p$, while this method, also known as Taylor mode, should be (almost) linear scaling w.r.t $p$.
17+
The main advantage of doing this instead of doing $p$ first-order Jacobian-vector products is that nesting first-order AD results in exponential scaling w.r.t $p$, while this method, also known as Taylor mode, should scale (almost) linearly w.r.t $p$.
1818
We will see the reason of this claim later.
1919

20-
In order to achieve this, assuming that $f$ is a nested function $f_k\circ\cdots\circ f_2\circ f_1$, where each $f_i$ is a basic and simple function, or called "primitives".
20+
In order to achieve this, we assume that $f$ is a nested function $f_k\circ\cdots\circ f_2\circ f_1$, where each $f_i$ is a basic and simple function, also called "primitive".
2121
We need to figure out how to propagate the derivatives through each step.
2222
In first order AD, this is achieved by the "dual" pair $x_0+x_1\varepsilon$, where $\varepsilon^2=0$, and for each primitive we make a method overload
2323
```math
@@ -118,20 +118,20 @@ Note that this is an elegant and straightforward corollary from the definition o
118118

119119
## Generic pushforward rule
120120

121-
For a generic $f(x)$, if we don't bother deriving the specific recurrence rule for it, we can still automatically generate pushforward rule in the following manner.
121+
For a generic $f(x)$, if we don't bother deriving the specific recurrence rule for it, we can still automatically generate a pushforward rule in the following manner.
122122
Let's denote the derivative of $f$ w.r.t $x$ to be $d(x)$, then for $f(t)=f(x(t))$ we have
123123
```math
124124
f'(t)=d(x(t))x'(t);\quad f(0)=f(x_0)
125125
```
126126

127127
when we expand $f$ and $x$ up to order $p$ into this equation, we notice that only order $p-1$ is needed for $d(x(t))$.
128-
In other words, we turn a problem of finding $p$-th order pushforward for $f$, to a problem of finding $p-1$-th order pushforward for $d$, and we can recurse down to the first order.
129-
The first-order derivative expressions are captured from ChainRules.jl, which made this process fully automatic.
128+
In other words, we turn a problem of finding $p$-th order pushforward for $f$, to a problem of finding $(p-1)$-th order pushforward for $d$, and we can recurse down to the first order.
129+
The first-order derivative expressions are captured from ChainRules.jl, which makes this process fully automatic.
130130

131-
This strategy is in principle equivalent to nesting first-order differentiation, which could potentially leads to exponential scaling; however, in practice there is a huge difference.
132-
This generation of pushforward rule happens at **compile time**, which gives the compiler a chance to check redundant expressions and optimize it down to quadratic time.
133-
Compiler has stack limits but this should work for at least up to order 100.
131+
This strategy is in principle equivalent to nesting first-order differentiation, which could potentially lead to exponential scaling; however, in practice there is a huge difference.
132+
This generation of pushforward rules happens at **compile time**, which gives the compiler a chance to check redundant expressions and optimize it down to quadratic time.
133+
The compiler has stack limits but this should work at least up to order 100.
134134

135-
In the current implementation of TaylorDiff.jl, all $\log$-like functions' pushforward rules are generated by this strategy, since their derivatives are simple algebraic expressions; some $\exp$-like functions, like sinh, is also generated; the most-often-used several $\exp$-like functions are hand-written with hand-derived recurrence relations.
135+
In the current implementation of TaylorDiff.jl, all $\log$-like functions' pushforward rules are generated by this strategy, since their derivatives are simple algebraic expressions; some $\exp$-like functions, like $\sinh$, are also generated; several of the most-often-used $\exp$-like functions are hand-written with hand-derived recurrence relations.
136136

137137
If you find that the code generated by this strategy is slow, please file an issue and we will look into it.

0 commit comments

Comments
 (0)