Skip to content

Commit 76298d3

Browse files
authored
More details on distribution pullback for Reinforce (#19)
1 parent 01352d4 commit 76298d3

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

docs/src/background.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,19 @@ The Monte-Carlo approximation for this is
110110

111111
$$\nabla_\theta q(y | \theta) \simeq \frac{1}{S} \sum_{s=1}^S \mathbf{1} \{f(x_s) = y\} ~ \nabla_\theta \log p(x_s | \theta)$$
112112

113+
In our implementation, we assume that the sampled $y_s$ are pairwise distinct (maybe not necessary?), and that together they form the whole support of the distribution $q$.
114+
We can thus consider the vector-to-vector mapping
115+
116+
$$q : \theta \longmapsto \begin{pmatrix} q(y_1|\theta) \\ \dots \\ q(y_S | \theta) \end{pmatrix}$$
117+
118+
whose Jacobian is given by
119+
120+
$$\partial_\theta q(\theta) = \frac{1}{S} \begin{pmatrix} \nabla_\theta \log p(x_1 | \theta)^\top \\ \dots \\ \nabla_\theta \log p(x_S | \theta)^\top \end{pmatrix}$$
121+
122+
and whose VJP is given by
123+
124+
$$\partial_\theta q(\theta)^\top \bar{q} = \frac{1}{S} \sum_s \bar{q}_s \nabla_\theta \log p(x_s | \theta)$$
125+
113126
### Reparametrization probability gradients
114127

115128
To leverage reparametrization, we perform a change of variables:

0 commit comments

Comments
 (0)