Skip to content

Commit 5709e85

Browse files
committed
More md -> mdx
1 parent 069cd4f commit 5709e85

File tree

5 files changed

+261
-270
lines changed

5 files changed

+261
-270
lines changed

.astro/types.d.ts

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -227,13 +227,13 @@ declare module 'astro:content' {
227227
collection: "post";
228228
data: InferEntrySchema<"post">
229229
} & { render(): Render[".md"] };
230-
"coupled_gan.md": {
231-
id: "coupled_gan.md";
232-
slug: "coupled_gan";
230+
"coupled-gan.mdx": {
231+
id: "coupled-gan.mdx";
232+
slug: "coupled-gan";
233233
body: string;
234234
collection: "post";
235235
data: InferEntrySchema<"post">
236-
} & { render(): Render[".md"] };
236+
} & { render(): Render[".mdx"] };
237237
"deploying-wagtail.md": {
238238
id: "deploying-wagtail.md";
239239
slug: "deploying-wagtail";
@@ -318,13 +318,13 @@ declare module 'astro:content' {
318318
collection: "post";
319319
data: InferEntrySchema<"post">
320320
} & { render(): Render[".mdx"] };
321-
"infogan.md": {
322-
id: "infogan.md";
321+
"infogan.mdx": {
322+
id: "infogan.mdx";
323323
slug: "infogan";
324324
body: string;
325325
collection: "post";
326326
data: InferEntrySchema<"post">
327-
} & { render(): Render[".md"] };
327+
} & { render(): Render[".mdx"] };
328328
"jekyll-fb-share.md": {
329329
id: "jekyll-fb-share.md";
330330
slug: "jekyll-fb-share";
@@ -528,13 +528,13 @@ declare module 'astro:content' {
528528
collection: "post";
529529
data: InferEntrySchema<"post">
530530
} & { render(): Render[".md"] };
531-
"wasserstein-gan.md": {
532-
id: "wasserstein-gan.md";
531+
"wasserstein-gan.mdx": {
532+
id: "wasserstein-gan.mdx";
533533
slug: "wasserstein-gan";
534534
body: string;
535535
collection: "post";
536536
data: InferEntrySchema<"post">
537-
} & { render(): Render[".md"] };
537+
} & { render(): Render[".mdx"] };
538538
};
539539

540540
};
Lines changed: 53 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -5,33 +5,39 @@ publishDate: 2017-02-18 04:27
55
tags: [machine learning, gan]
66
---
77

8-
The full code is available here: <https://github.com/wiseodd/generative-models>.
8+
import BlogImage from "@/components/BlogImage.astro";
99

10-
[Vanilla GAN]() is a method to learn marginal distribution of data \\( P(X) \\). Since then, it has been extended to make it [learns conditional distribution]() \\( P(X \vert c) \\). Naturally, the next extension of GAN is to learn joint distribution of data \\( P(X_1, X_2) \\), where \\( X_1 \\) and \\( X_2 \\) are from different domain, e.g. color image and its corresponding B&W version.
10+
The full code is available here: https://github.com/wiseodd/generative-models.
1111

12-
Coupled GAN (CoGAN) is a method that extends GAN so that it could learn joint distribution, by only needing samples from the marginals. What it means is that we do not need to sample from joint distribution \\( P(X_1, X_2) \\), i.e. a tuple of \\( \(x_1, x_2\) \\), during training. We only need \\( x_1 \sim P(X_1) \\) and \\( x_2 \sim P(X_2) \\), samples from the marginal distributions. This property makes CoGAN very useful as collecting representing samples of joint distribution is costly due to curse of dimensionality.
12+
Vanilla GAN is a method to learn marginal distribution of data $P(X)$. Since then, it has been extended to make it learns conditional distribution $P(X \vert c)$. Naturally, the next extension of GAN is to learn joint distribution of data $P(X_1, X_2)$, where $X_1$ and $X_2$ are from different domain, e.g. color image and its corresponding B&W version.
13+
14+
Coupled GAN (CoGAN) is a method that extends GAN so that it could learn joint distribution, by only needing samples from the marginals. What it means is that we do not need to sample from joint distribution $P(X_1, X_2)$, i.e. a tuple of $(x_1, x_2)$, during training. We only need $x_1 \sim P(X_1)$ and $x_2 \sim P(X_2)$, samples from the marginal distributions. This property makes CoGAN very useful as collecting representing samples of joint distribution is costly due to curse of dimensionality.
1315

1416
## Learning joint distribution by sharing weights
1517

1618
So, how exactly does CoGAN learn joint distribution by only using the marginals?
1719

1820
The trick here is to add a constraint such that high level representations of data are shared. Specifically, we constraint our networks to have the same weights on several layers. The intuition is that by constraining the weights to be identical to each other, CoGAN will converge to the optimum solution where those weights represent shared representation (joint representation) of both domains of data.
1921

20-
![CoGAN schematic]({{ site.baseurl }}/img/2017-02-18-coupled-gan/schematic.png)
22+
<BlogImage
23+
imagePath='/img/coupled-gan/schematic.png'
24+
altText='CoGAN schematic.'
25+
fullWidth
26+
/>
2127

2228
But which layers should be constrained? To answer this, we need to observe that neural nets that are used for classification tasks learn data representation in bottom-up fashion, i.e. from low level representation to high level representation. We notice that low level representation is highly specialized on data, which is not general enough. Hence, we constraint our neural net on several layers that encode the high level representation.
2329

2430
Intuitively, the lower level layers capture image specific features, e.g. the thickness of edges, the saturation of colors, etc. But, higher level layers capture more general features, such as the abstract representation of "bird", "dog", etc., ignoring the color or the thickness of the images. So, naturally, to capture joint representation of data, we want to use higher level layers, then use lower level layers to encode those abstract representation into image specific features, so that we get the correct (in general sense) and plausible (in detailed sense) images.
2531

26-
Using that reasoning, we then could choose which layers should be constrained. For discriminator, it should be the last layers. For generator, it should be the first layers, as generator in GAN solves inverse problem: from latent representation \\( z \\) to image \\( X \\).
32+
Using that reasoning, we then could choose which layers should be constrained. For discriminator, it should be the last layers. For generator, it should be the first layers, as generator in GAN solves inverse problem: from latent representation $z$ to image $X$.
2733

2834
## CoGAN algorithm
2935

30-
If we want to learn joint distribution of \\( K \\) domains, then we need to use \\( 2K \\) neural nets, as for each domain we need a discriminator and a generator. Fortunately, as CoGAN is centered on weight sharing, this could prove helpful to reduce the computation cost.
36+
If we want to learn joint distribution of $K$ domains, then we need to use $2K$ neural nets, as for each domain we need a discriminator and a generator. Fortunately, as CoGAN is centered on weight sharing, this could prove helpful to reduce the computation cost.
3137

3238
The algorithm for CoGAN for 2 domains is as follows:
3339

34-
![CoGAN algo]({{ site.baseurl }}/img/2017-02-18-coupled-gan/algo.png)
40+
<BlogImage imagePath='/img/coupled-gan/algo.png' altText='CoGAN algorithm.' fullWidth />
3541

3642
Notice that CoGAN draws samples from each marginal distribution. That means, we only need 2 sets of training data. We do not need to construct specialized training data that captures joint distribution of those two domains. However, as we learn joint distribution by weight sharing on high level features, to make CoGAN training successful, we have to make sure that those two domains of data share some high level representations.
3743

@@ -44,50 +50,48 @@ X_train = mnist.train.images
4450
half = int(X_train.shape[0] / 2)
4551

4652
# Real image
47-
4853
X_train1 = X_train[:half]
4954

5055
# Rotated image
51-
5256
X_train2 = X_train[half:].reshape(-1, 28, 28)
5357
X_train2 = scipy.ndimage.interpolation.rotate(X_train2, 90, axes=(1, 2))
54-
X_train2 = X_train2.reshape(-1, 28\*28)
58+
X_train2 = X_train2.reshape(-1, 28*28)
5559
```
5660

5761
Let's declare the generators first, which are two layers fully connected nets, with first weight (input to hidden) shared:
5862

5963
```python
6064
""" Shared Generator weights """
6165
G_shared = torch.nn.Sequential(
62-
torch.nn.Linear(z_dim, h_dim),
63-
torch.nn.ReLU(),
66+
torch.nn.Linear(z_dim, h_dim),
67+
torch.nn.ReLU(),
6468
)
6569

6670
""" Generator 1 """
67-
G1\_ = torch.nn.Sequential(
68-
torch.nn.Linear(h_dim, X_dim),
69-
torch.nn.Sigmoid()
71+
G1_ = torch.nn.Sequential(
72+
torch.nn.Linear(h_dim, X_dim),
73+
torch.nn.Sigmoid()
7074
)
7175

7276
""" Generator 2 """
73-
G2\_ = torch.nn.Sequential(
74-
torch.nn.Linear(h_dim, X_dim),
75-
torch.nn.Sigmoid()
77+
G2_ = torch.nn.Sequential(
78+
torch.nn.Linear(h_dim, X_dim),
79+
torch.nn.Sigmoid()
7680
)
7781
```
7882

7983
Then we make a wrapper for those nets:
8084

8185
```python
8286
def G1(z):
83-
h = G*shared(z)
84-
X = G1*(h)
85-
return X
87+
h = G_shared(z)
88+
X = G1_(h)
89+
return X
8690

8791
def G2(z):
88-
h = G*shared(z)
89-
X = G2*(h)
90-
return X
92+
h = G_shared(z)
93+
X = G2_(h)
94+
return X
9195
```
9296

9397
Notice that `G_shared` are being used in those two nets.
@@ -97,46 +101,44 @@ The discriminators are also two layers nets, similar to the generators, but shar
97101
```python
98102
""" Shared Discriminator weights """
99103
D_shared = torch.nn.Sequential(
100-
torch.nn.Linear(h_dim, 1),
101-
torch.nn.Sigmoid()
104+
torch.nn.Linear(h_dim, 1),
105+
torch.nn.Sigmoid()
102106
)
103107

104108
""" Discriminator 1 """
105-
D1\_ = torch.nn.Sequential(
106-
torch.nn.Linear(X_dim, h_dim),
107-
torch.nn.ReLU()
109+
D1_ = torch.nn.Sequential(
110+
torch.nn.Linear(X_dim, h_dim),
111+
torch.nn.ReLU()
108112
)
109113

110114
""" Discriminator 2 """
111-
D2\_ = torch.nn.Sequential(
112-
torch.nn.Linear(X_dim, h_dim),
113-
torch.nn.ReLU()
115+
D2_ = torch.nn.Sequential(
116+
torch.nn.Linear(X_dim, h_dim),
117+
torch.nn.ReLU()
114118
)
115119

116120
def D1(X):
117-
h = D1\_(X)
118-
y = D_shared(h)
119-
return y
121+
h = D1_(X)
122+
y = D_shared(h)
123+
return y
120124

121125
def D2(X):
122-
h = D2\_(X)
123-
y = D_shared(h)
124-
return y
126+
h = D2_(X)
127+
y = D_shared(h)
128+
return y
125129
```
126130

127131
Next, we construct the optimizer:
128132

129133
```python
130-
D*params = (list(D1*.parameters()) + list(D2*.parameters()) +
131-
list(D_shared.parameters()))
132-
G_params = (list(G1*.parameters()) + list(G2\_.parameters()) +
133-
list(G_shared.parameters()))
134+
D_params = (list(D1.parameters()) + list(D2.parameters()) + list(D_shared.parameters()))
135+
G_params = (list(G1.parameters()) + list(G2.parameters()) + list(G_shared.parameters()))
134136

135137
G_solver = optim.Adam(G_params, lr=lr)
136138
D_solver = optim.Adam(D_params, lr=lr)
137139
```
138140

139-
Now we are ready to train CoGAN. At each training iteration, we do these steps below. First, we sample images from both marginal training sets, and \\( z \\) from our prior:
141+
Now we are ready to train CoGAN. At each training iteration, we do these steps below. First, we sample images from both marginal training sets, and $z$ from our prior:
140142

141143
```python
142144
X1 = sample_x(X_train1, mb_size)
@@ -151,11 +153,8 @@ G1_sample = G1(z)
151153
D1_real = D1(X1)
152154
D1_fake = D1(G1_sample)
153155

154-
D1_loss = torch.mean(-torch.log(D1_real + 1e-8) -
155-
torch.log(1. - D1_fake + 1e-8))
156-
157-
D2_loss = torch.mean(-torch.log(D2_real + 1e-8) -
158-
torch.log(1. - D2_fake + 1e-8))
156+
D1_loss = torch.mean(-torch.log(D1_real + 1e-8) - torch.log(1. - D1_fake + 1e-8))
157+
D2_loss = torch.mean(-torch.log(D2_real + 1e-8) - torch.log(1. - D2_fake + 1e-8))
159158
```
160159

161160
Then we just add up those loss. During backpropagation, `D_shared` will naturally get gradients from both `D1` and `D2`, i.e. sum of both branches. All we need to do to get the average is to scale them:
@@ -165,9 +164,8 @@ D_loss = D1_loss + D2_loss
165164
D_loss.backward()
166165

167166
# Average the gradients
168-
169167
for p in D_shared.parameters():
170-
p.grad.data = 0.5 \* p.grad.data
168+
p.grad.data = 0.5 * p.grad.data
171169
```
172170

173171
As we have all the gradients, we could update the weights:
@@ -180,9 +178,7 @@ reset_grad()
180178
For generators training, the procedure is similar to discriminators training, where we need to average the loss of `G1` and `G2` w.r.t. `G_shared`.
181179

182180
```python
183-
184181
# Generator
185-
186182
G1_sample = G1(z)
187183
D1_fake = D1(G1_sample)
188184

@@ -196,27 +192,26 @@ G_loss = G1_loss + G2_loss
196192
G_loss.backward()
197193

198194
# Average the gradients
199-
200195
for p in G_shared.parameters():
201-
p.grad.data = 0.5 \* p.grad.data
196+
p.grad.data = 0.5 * p.grad.data
202197

203198
G_solver.step()
204199
reset_grad()
205200
```
206201

207202
## Results
208203

209-
After many thousands of iterations, `G1` and `G2` will produce these kind of samples. Note, first two rows are the normal MNIST images, the next two rows are the rotated images. Also, the \\( z \\) that were fed into `G1` and `G2` are the same so that we could see given the same latent code \\( z \\), we could sample \\( \( x_1, x_2 \) \\) that are corresponding to each other from the joint distribution.
204+
After many thousands of iterations, `G1` and `G2` will produce these kind of samples. Note, first two rows are the normal MNIST images, the next two rows are the rotated images. Also, the $z$ that were fed into `G1` and `G2` are the same so that we could see given the same latent code $z$, we could sample $( x_1, x_2 )$ that are corresponding to each other from the joint distribution.
210205

211-
![Result 1]({{ site.baseurl }}/img/2017-02-18-coupled-gan/res1.png)
206+
<BlogImage imagePath='/img/coupled-gan/res1.png' altText='Result.' />
212207

213-
![Result 2]({{ site.baseurl }}/img/2017-02-18-coupled-gan/res2.png)
208+
<BlogImage imagePath='/img/coupled-gan/res2.png' altText='Result.' />
214209

215210
Obviously, if we swap our nets with more powerful ones, we could get higher quality samples.
216211

217212
If we squint, we could see that _roughly_, images at the third row are the 90 degree rotation of the first row. Also, the fourth row are the corresponding images of the second row.
218213

219-
This is a marvelous results considering we did not explicitly show CoGAN the samples from joint distribution (i.e. a tuple of \\( \(x_1, x_2\) \\)). We only show samples from disjoint marginals. In summary, CoGAN is able to infer the joint distribution by itself.
214+
This is a marvelous results considering we did not explicitly show CoGAN the samples from joint distribution (i.e. a tuple of $(x_1, x_2)$). We only show samples from disjoint marginals. In summary, CoGAN is able to infer the joint distribution by itself.
220215

221216
## Conclusion
222217

0 commit comments

Comments
 (0)