You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Vanilla GAN]() is a method to learn marginal distribution of data \\( P(X) \\). Since then, it has been extended to make it [learns conditional distribution]()\\( P(X \vert c) \\). Naturally, the next extension of GAN is to learn joint distribution of data \\( P(X_1, X_2) \\), where \\( X_1 \\) and \\( X_2 \\) are from different domain, e.g. color image and its corresponding B&W version.
10
+
The full code is available here: https://github.com/wiseodd/generative-models.
11
11
12
-
Coupled GAN (CoGAN) is a method that extends GAN so that it could learn joint distribution, by only needing samples from the marginals. What it means is that we do not need to sample from joint distribution \\( P(X_1, X_2) \\), i.e. a tuple of \\( \(x_1, x_2\)\\), during training. We only need \\( x_1 \sim P(X_1) \\) and \\( x_2 \sim P(X_2) \\), samples from the marginal distributions. This property makes CoGAN very useful as collecting representing samples of joint distribution is costly due to curse of dimensionality.
12
+
Vanilla GAN is a method to learn marginal distribution of data $P(X)$. Since then, it has been extended to make it learns conditional distribution $P(X \vert c)$. Naturally, the next extension of GAN is to learn joint distribution of data $P(X_1, X_2)$, where $X_1$ and $X_2$ are from different domain, e.g. color image and its corresponding B&W version.
13
+
14
+
Coupled GAN (CoGAN) is a method that extends GAN so that it could learn joint distribution, by only needing samples from the marginals. What it means is that we do not need to sample from joint distribution $P(X_1, X_2)$, i.e. a tuple of $(x_1, x_2)$, during training. We only need $x_1 \sim P(X_1)$ and $x_2 \sim P(X_2)$, samples from the marginal distributions. This property makes CoGAN very useful as collecting representing samples of joint distribution is costly due to curse of dimensionality.
13
15
14
16
## Learning joint distribution by sharing weights
15
17
16
18
So, how exactly does CoGAN learn joint distribution by only using the marginals?
17
19
18
20
The trick here is to add a constraint such that high level representations of data are shared. Specifically, we constraint our networks to have the same weights on several layers. The intuition is that by constraining the weights to be identical to each other, CoGAN will converge to the optimum solution where those weights represent shared representation (joint representation) of both domains of data.
But which layers should be constrained? To answer this, we need to observe that neural nets that are used for classification tasks learn data representation in bottom-up fashion, i.e. from low level representation to high level representation. We notice that low level representation is highly specialized on data, which is not general enough. Hence, we constraint our neural net on several layers that encode the high level representation.
23
29
24
30
Intuitively, the lower level layers capture image specific features, e.g. the thickness of edges, the saturation of colors, etc. But, higher level layers capture more general features, such as the abstract representation of "bird", "dog", etc., ignoring the color or the thickness of the images. So, naturally, to capture joint representation of data, we want to use higher level layers, then use lower level layers to encode those abstract representation into image specific features, so that we get the correct (in general sense) and plausible (in detailed sense) images.
25
31
26
-
Using that reasoning, we then could choose which layers should be constrained. For discriminator, it should be the last layers. For generator, it should be the first layers, as generator in GAN solves inverse problem: from latent representation \\( z \\) to image \\( X \\).
32
+
Using that reasoning, we then could choose which layers should be constrained. For discriminator, it should be the last layers. For generator, it should be the first layers, as generator in GAN solves inverse problem: from latent representation $z$ to image $X$.
27
33
28
34
## CoGAN algorithm
29
35
30
-
If we want to learn joint distribution of \\( K \\) domains, then we need to use \\( 2K \\) neural nets, as for each domain we need a discriminator and a generator. Fortunately, as CoGAN is centered on weight sharing, this could prove helpful to reduce the computation cost.
36
+
If we want to learn joint distribution of $K$ domains, then we need to use $2K$ neural nets, as for each domain we need a discriminator and a generator. Fortunately, as CoGAN is centered on weight sharing, this could prove helpful to reduce the computation cost.
31
37
32
38
The algorithm for CoGAN for 2 domains is as follows:
Notice that CoGAN draws samples from each marginal distribution. That means, we only need 2 sets of training data. We do not need to construct specialized training data that captures joint distribution of those two domains. However, as we learn joint distribution by weight sharing on high level features, to make CoGAN training successful, we have to make sure that those two domains of data share some high level representations.
Now we are ready to train CoGAN. At each training iteration, we do these steps below. First, we sample images from both marginal training sets, and\\( z \\) from our prior:
141
+
Now we are ready to train CoGAN. At each training iteration, we do these steps below. First, we sample images from both marginal training sets, and $z$ from our prior:
Then we just add up those loss. During backpropagation, `D_shared` will naturally get gradients from both `D1` and `D2`, i.e. sum of both branches. All we need to do to get the average is to scale them:
@@ -165,9 +164,8 @@ D_loss = D1_loss + D2_loss
165
164
D_loss.backward()
166
165
167
166
# Average the gradients
168
-
169
167
for p in D_shared.parameters():
170
-
p.grad.data = 0.5\* p.grad.data
168
+
p.grad.data =0.5* p.grad.data
171
169
```
172
170
173
171
As we have all the gradients, we could update the weights:
@@ -180,9 +178,7 @@ reset_grad()
180
178
For generators training, the procedure is similar to discriminators training, where we need to average the loss of `G1` and `G2` w.r.t. `G_shared`.
181
179
182
180
```python
183
-
184
181
# Generator
185
-
186
182
G1_sample = G1(z)
187
183
D1_fake = D1(G1_sample)
188
184
@@ -196,27 +192,26 @@ G_loss = G1_loss + G2_loss
196
192
G_loss.backward()
197
193
198
194
# Average the gradients
199
-
200
195
for p in G_shared.parameters():
201
-
p.grad.data = 0.5\* p.grad.data
196
+
p.grad.data =0.5* p.grad.data
202
197
203
198
G_solver.step()
204
199
reset_grad()
205
200
```
206
201
207
202
## Results
208
203
209
-
After many thousands of iterations, `G1`and`G2` will produce these kind of samples. Note, first two rows are the normal MNIST images, the next two rows are the rotated images. Also, the \\( z \\) that were fed into `G1` and `G2` are the same so that we could see given the same latent code \\( z \\), we could sample \\( \( x_1, x_2 \) \\) that are corresponding to each other from the joint distribution.
204
+
After many thousands of iterations, `G1` and `G2` will produce these kind of samples. Note, first two rows are the normal MNIST images, the next two rows are the rotated images. Also, the $z$ that were fed into `G1` and `G2` are the same so that we could see given the same latent code $z$, we could sample $( x_1, x_2 )$ that are corresponding to each other from the joint distribution.
Obviously, if we swap our nets with more powerful ones, we could get higher quality samples.
216
211
217
212
If we squint, we could see that _roughly_, images at the third row are the 90 degree rotation of the first row. Also, the fourth row are the corresponding images of the second row.
218
213
219
-
This is a marvelous results considering we did not explicitly show CoGAN the samples from joint distribution (i.e. a tuple of \\( \(x_1, x_2\) \\)). We only show samples from disjoint marginals. In summary, CoGAN is able to infer the joint distribution by itself.
214
+
This is a marvelous results considering we did not explicitly show CoGAN the samples from joint distribution (i.e. a tuple of $(x_1, x_2)$). We only show samples from disjoint marginals. In summary, CoGAN is able to infer the joint distribution by itself.
0 commit comments