Skip to content

Commit d7a30c0

Browse files
committed
More md -> mdx
1 parent a95799b commit d7a30c0

File tree

9 files changed

+441
-438
lines changed

9 files changed

+441
-438
lines changed

.astro/types.d.ts

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -185,13 +185,13 @@ declare module 'astro:content' {
185185
collection: "post";
186186
data: InferEntrySchema<"post">
187187
} & { render(): Render[".mdx"] };
188-
"conditional-gan-tensorflow.md": {
189-
id: "conditional-gan-tensorflow.md";
188+
"conditional-gan-tensorflow.mdx": {
189+
id: "conditional-gan-tensorflow.mdx";
190190
slug: "conditional-gan-tensorflow";
191191
body: string;
192192
collection: "post";
193193
data: InferEntrySchema<"post">
194-
} & { render(): Render[".md"] };
194+
} & { render(): Render[".mdx"] };
195195
"conditional-vae.md": {
196196
id: "conditional-vae.md";
197197
slug: "conditional-vae";
@@ -409,13 +409,13 @@ declare module 'astro:content' {
409409
collection: "post";
410410
data: InferEntrySchema<"post">
411411
} & { render(): Render[".mdx"] };
412-
"mle-vs-map.md": {
413-
id: "mle-vs-map.md";
412+
"mle-vs-map.mdx": {
413+
id: "mle-vs-map.mdx";
414414
slug: "mle-vs-map";
415415
body: string;
416416
collection: "post";
417417
data: InferEntrySchema<"post">
418-
} & { render(): Render[".md"] };
418+
} & { render(): Render[".mdx"] };
419419
"natural-gradient.mdx": {
420420
id: "natural-gradient.mdx";
421421
slug: "natural-gradient";

bun.lockb

0 Bytes
Binary file not shown.

src/content/post/conditional-gan-tensorflow.md renamed to src/content/post/conditional-gan-tensorflow.mdx

Lines changed: 37 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -5,38 +5,40 @@ publishDate: 2016-12-24 05:30
55
tags: [machine learning, programming, python, neural networks, gan]
66
---
77

8-
We have seen the Generative Adversarial Nets (GAN) model in [the previous post]({% post_url 2016-09-17-gan-tensorflow %}). We have also seen the arch nemesis of GAN, the VAE and its conditional variation: Conditional VAE (CVAE). Hence, it is only proper for us to study conditional variation of GAN, called Conditional GAN or CGAN for short.
8+
import BlogImage from "@/components/BlogImage.astro";
9+
10+
We have seen the Generative Adversarial Nets (GAN) model in the previous post. We have also seen the arch nemesis of GAN, the VAE and its conditional variation: Conditional VAE (CVAE). Hence, it is only proper for us to study conditional variation of GAN, called Conditional GAN or CGAN for short.
911

1012
## CGAN: Formulation and Architecture
1113

12-
Recall, in GAN, we have two neural nets: the generator \\( G(z) \\) and the discriminator \\( D(X) \\). Now, as we want to condition those networks with some vector \\( y \\), the easiest way to do it is to feed \\( y \\) into both networks. Hence, our generator and discriminator are now \\( G(z, y) \\) and \\( D(X, y) \\) respectively.
14+
Recall, in GAN, we have two neural nets: the generator $G(z)$ and the discriminator $D(X)$. Now, as we want to condition those networks with some vector $y$, the easiest way to do it is to feed $y$ into both networks. Hence, our generator and discriminator are now $G(z, y)$ and $D(X, y)$ respectively.
1315

14-
We can see it with a probabilistic point of view. \\( G(z, y) \\) is modeling the distribution of our data, given \\( z \\) and \\( y \\), that is, our data is generated with this scheme \\( X \sim G(X \, \vert \, z, y) \\).
16+
We can see it with a probabilistic point of view. $G(z, y)$ is modeling the distribution of our data, given $z$ and $y$, that is, our data is generated with this scheme $X \sim G(X \, \vert \, z, y)$.
1517

16-
Likewise for the discriminator, now it tries to find discriminating label for \\( X \\) and \\( X_G \\), that are modeled with \\( d \sim D(d \, \vert \, X, y) \\).
18+
Likewise for the discriminator, now it tries to find discriminating label for $X$ and $X_G$, that are modeled with $d \sim D(d \, \vert \, X, y)$.
1719

18-
Hence, we could see that both \\( D \\) and \\( G \\) is jointly conditioned to two variables \\( z \\) or \\( X \\) and \\( y \\).
20+
Hence, we could see that both $D$ and $G$ is jointly conditioned to two variables $z$ or $X$ and $y$.
1921

2022
Now, the objective function is given by:
2123

2224
$$
23-
2425
\min_G \max_D V(D, G) = \mathop{\mathbb{E}}_{x \sim p_{data}(x)} [\log D(x, y)] + \mathop{\mathbb{E}}_{z \sim p_z(z)} [\log(1 - D(G(z, y), y))]
25-
26-
2726
$$
2827

29-
If we compare the above loss to GAN loss, the difference only lies in the additional parameter \\( y \\) in both \\( D \\) and \\( G \\).
28+
If we compare the above loss to GAN loss, the difference only lies in the additional parameter $y$ in both $D$ and $G$.
3029

3130
The architecture of CGAN is now as follows (taken from [1]):
3231

33-
![CGAN arch]({{ site.baseurl }}/img/2016-12-24-conditional-gan-tensorflow/arch.png)
32+
<BlogImage
33+
imagePath='/img/conditional-gan-tensorflow/arch.png'
34+
altText='CGAN architecture.'
35+
/>
3436

3537
In contrast with the architecture of GAN, we now has an additional input layer in both discriminator net and generator net.
3638

3739
## CGAN: Implementation in TensorFlow
3840

39-
I'd like to direct the reader to the [previous post about GAN]({% post_url 2016-09-17-gan-tensorflow %}), particularly for the implementation in TensorFlow. Implementing CGAN is so simple that we just need to add a handful of lines to the original GAN implementation. So, here we will only look at those modifications.
41+
Implementing CGAN is so simple that we just need to add a handful of lines to the original GAN implementation. So, here we will only look at those modifications.
4042

4143
The first additional code for CGAN is here:
4244

@@ -49,38 +51,37 @@ We are adding new input to hold our variable we are conditioning our CGAN to.
4951
Next, we add it to both our generator net and discriminator net:
5052

5153
```python
52-
def generator(z, y): # Concatenate z and y
53-
inputs = tf.concat(concat_dim=1, values=[z, y])
54+
def generator(z, y):
55+
# Concatenate z and y
56+
inputs = tf.concat(concat_dim=1, values=[z, y])
5457

5558
G_h1 = tf.nn.relu(tf.matmul(inputs, G_W1) + G_b1)
5659
G_log_prob = tf.matmul(G_h1, G_W2) + G_b2
5760
G_prob = tf.nn.sigmoid(G_log_prob)
5861

5962
return G_prob
6063

61-
def discriminator(x, y): # Concatenate x and y
62-
inputs = tf.concat(concat_dim=1, values=[x, y])
64+
65+
def discriminator(x, y):
66+
# Concatenate x and y
67+
inputs = tf.concat(concat_dim=1, values=[x, y])
6368

6469
D_h1 = tf.nn.relu(tf.matmul(inputs, D_W1) + D_b1)
6570
D_logit = tf.matmul(D_h1, D_W2) + D_b2
6671
D_prob = tf.nn.sigmoid(D_logit)
6772

6873
return D_prob, D_logit
69-
7074
```
7175

72-
The problem we have here is how to incorporate the new variable \\( y \\) into \\( D(X) \\) and \\( G(z) \\). As we are trying to model the joint conditional, the simplest way to do it is to just concatenate both variables. Hence, in \\( G(z, y) \\), we are concatenating \\( z \\) and \\( y \\) before we feed it into the networks. The same procedure is applied to \\( D(X, y) \\).
76+
The problem we have here is how to incorporate the new variable $y$ into $D(X)$ and $G(z)$. As we are trying to model the joint conditional, the simplest way to do it is to just concatenate both variables. Hence, in $G(z, y)$, we are concatenating $z$ and $y$ before we feed it into the networks. The same procedure is applied to $D(X, y)$.
7377

74-
Of course, as our inputs for \\( D(X, y) \\) and \\( G(z, y) \\) is now different than the original GAN, we need to modify our weights:
78+
Of course, as our inputs for $D(X, y)$ and $G(z, y)$ is now different than the original GAN, we need to modify our weights:
7579

7680
```python
77-
7881
# Modify input to hidden weights for discriminator
79-
8082
D_W1 = tf.Variable(shape=[X_dim + y_dim, h_dim]))
8183

8284
# Modify input to hidden weights for generator
83-
8485
G_W1 = tf.Variable(shape=[Z_dim + y_dim, h_dim]))
8586
```
8687

@@ -89,25 +90,23 @@ That is, we just adjust the dimensionality of our weights.
8990
Next, we just use our new networks:
9091

9192
```python
92-
9393
# Add additional parameter y into all networks
94-
9594
G_sample = generator(Z, y)
9695
D_real, D_logit_real = discriminator(X, y)
9796
D_fake, D_logit_fake = discriminator(G_sample, y)
9897
```
9998

100-
And finally, when training, we also feed the value of \\( y \\) into the networks:
99+
And finally, when training, we also feed the value of $y$ into the networks:
101100

102101
```python
103102
X_mb, y_mb = mnist.train.next_batch(mb_size)
104103

105-
Z*sample = sample_Z(mb_size, Z_dim)
106-
*, D*loss_curr = sess.run([D_solver, D_loss], feed_dict={X: X_mb, Z: Z_sample, y:y_mb})
107-
*, G_loss_curr = sess.run([G_solver, G_loss], feed_dict={Z: Z_sample, y:y_mb})
104+
Z_sample = sample_Z(mb_size, Z_dim)
105+
_, D_loss_curr = sess.run([D_solver, D_loss], feed_dict={X: X_mb, Z: Z_sample, y:y_mb})
106+
_, G_loss_curr = sess.run([G_solver, G_loss], feed_dict={Z: Z_sample, y:y_mb})
108107
```
109108

110-
As an example above, we are training our GAN with MNIST data, and the conditional variable \\( y \\) is the labels.
109+
As an example above, we are training our GAN with MNIST data, and the conditional variable $y$ is the labels.
111110

112111
## CGAN: Results
113112

@@ -118,24 +117,29 @@ n_sample = 16
118117
Z_sample = sample_Z(n_sample, Z_dim)
119118

120119
# Create conditional one-hot vector, with index 5 = 1
121-
122120
y_sample = np.zeros(shape=[n_sample, y_dim])
123121
y_sample[:, 5] = 1
124122

125123
samples = sess.run(G_sample, feed_dict={Z: Z_sample, y:y_sample})
126124
```
127125

128-
Above, we just sample \\( z \\), and then construct the conditional variables. In our example case, the conditional variables is a collection of one-hot vectors with value 1 in the 5th index. The last thing we need to is to run the network with those variables as inputs.
126+
Above, we just sample $z$, and then construct the conditional variables. In our example case, the conditional variables is a collection of one-hot vectors with value 1 in the 5th index. The last thing we need to is to run the network with those variables as inputs.
129127

130128
Here is the results:
131129

132-
![Sample 5]({{ site.baseurl }}/img/2016-12-24-conditional-gan-tensorflow/5.png)
130+
<BlogImage
131+
imagePath='/img/conditional-gan-tensorflow/5.png'
132+
altText='Conditional samples.'
133+
/>
133134

134135
Looks pretty much like digit 5, right?
135136

136137
If we set our one-hot vectors to have value of 1 in the 7th index:
137138

138-
![Sample 7]({{ site.baseurl }}/img/2016-12-24-conditional-gan-tensorflow/7.png)
139+
<BlogImage
140+
imagePath='/img/conditional-gan-tensorflow/7.png'
141+
altText='Conditional samples.'
142+
/>
139143

140144
Those results confirmed that have successfully trained our CGAN.
141145

@@ -145,7 +149,7 @@ In this post, we looked at the analogue of CVAE for GAN: the Conditional GAN (CG
145149

146150
The conditional variables for CGAN, just like CVAE, could be anything. Hence it makes CGAN an interesting model to work with for data modeling.
147151

148-
The full code is available at my GitHub repo: <https://github.com/wiseodd/generative-models>.
152+
The full code is available at my GitHub repo: https://github.com/wiseodd/generative-models.
149153

150154
## References
151155

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@ publishDate: 2016-12-17 11:04
55
tags: [programming, python, neuralnet]
66
---
77

8-
Conditional Variational Autoencoder (CVAE) is an extension of [Variational Autoencoder (VAE)]({% post_url 2016-12-10-variational-autoencoder %}), a generative model that we have studied in the last post. We've seen that by formulating the problem of data generation as a bayesian model, we could optimize its variational lower bound to learn the model.
8+
import BlogImage from "@/components/BlogImage.astro";
9+
10+
Conditional Variational Autoencoder (CVAE) is an extension of Variational Autoencoder (VAE), a generative model that we have studied in the last post. We've seen that by formulating the problem of data generation as a bayesian model, we could optimize its variational lower bound to learn the model.
911

1012
However, we have no control on the data generation process on VAE. This could be problematic if we want to generate some specific data. As an example, suppose we want to convert a unicode character to handwriting. In vanilla VAE, there is no way to generate the handwriting based on the character that the user inputted. Concretely, suppose the user inputted character '2', how do we generate handwriting image that is a character '2'? We couldn't.
1113

@@ -17,27 +19,27 @@ Recall, on VAE, the objective is:
1719

1820
$$ \log P(X) - D*{KL}[Q(z \vert X) \Vert P(z \vert X)] = E[\log P(X \vert z)] - D*{KL}[Q(z \vert X) \Vert P(z)] $$
1921

20-
that is, we want to optimize the log likelihood of our data \\( P(X) \\) under some "encoding" error. The original VAE model has two parts: the encoder \\( Q(z \vert X) \\) and the decoder \\( P(X \vert z) \\).
22+
that is, we want to optimize the log likelihood of our data $P(X)$ under some "encoding" error. The original VAE model has two parts: the encoder $Q(z \vert X)$ and the decoder $P(X \vert z)$.
2123

22-
Looking closely at the model, we could see why can't VAE generate specific data, as per our example above. It's because the encoder models the latent variable \\( z \\) directly based on \\( X \\), it doesn't care about the different type of \\( X \\). For example, it doesn't take any account on the label of \\( X \\).
24+
Looking closely at the model, we could see why can't VAE generate specific data, as per our example above. It's because the encoder models the latent variable $z$ directly based on $X$, it doesn't care about the different type of $X$. For example, it doesn't take any account on the label of $X$.
2325

24-
Similarly, in the decoder part, it only models \\( X \\) directly based on the latent variable \\( z \\).
26+
Similarly, in the decoder part, it only models $X$ directly based on the latent variable $z$.
2527

26-
We could improve VAE by conditioning the encoder and decoder to another thing(s). Let's say that other thing is \\( c \\), so the encoder is now conditioned to two variables \\( X \\) and \\( c \\): \\( Q(z \vert X, c) \\). The same with the decoder, it's now conditioned to two variables \\( z \\) and \\( c \\): \\( P(X \vert z, c) \\).
28+
We could improve VAE by conditioning the encoder and decoder to another thing(s). Let's say that other thing is $c$, so the encoder is now conditioned to two variables $X$ and $c$: $Q(z \vert X, c)$. The same with the decoder, it's now conditioned to two variables $z$ and $c$: $P(X \vert z, c)$.
2729

2830
Hence, our variational lower bound objective is now in this following form:
2931

3032
$$ \log P(X \vert c) - D*{KL}[Q(z \vert X, c) \Vert P(z \vert X, c)] = E[\log P(X \vert z, c)] - D*{KL}[Q(z \vert X, c) \Vert P(z \vert c)] $$
3133

32-
i.e. we just conditioned all of the distributions with a variable \\( c \\).
34+
i.e. we just conditioned all of the distributions with a variable $c$.
3335

34-
Now, the real latent variable is distributed under \\( P(z \vert c ) \\). That is, it's now a conditional probability distribution (CPD). Think about it like this: for each possible value of \\( c \\), we would have a \\( P(z) \\). We could also use this form of thinking for the decoder.
36+
Now, the real latent variable is distributed under $P(z \vert c )$. That is, it's now a conditional probability distribution (CPD). Think about it like this: for each possible value of $c$, we would have a $P(z)$. We could also use this form of thinking for the decoder.
3537

3638
## CVAE: Implementation
3739

38-
The conditional variable \\( c \\) could be anything. We could assume it comes from a categorical distribution expressing the label of our data, gaussian expressing some regression target, or even the same distribution as the data (e.g. for image inpainting: conditioning the model to incomplete image).
40+
The conditional variable $c$ could be anything. We could assume it comes from a categorical distribution expressing the label of our data, gaussian expressing some regression target, or even the same distribution as the data (e.g. for image inpainting: conditioning the model to incomplete image).
3941

40-
Let's use MNIST for example. We could use the label as our conditional variable \\( c \\). In this case, \\( c \\) is categorically distributed, or in other words, it takes form as an one-hot vector of label:
42+
Let's use MNIST for example. We could use the label as our conditional variable $c$. In this case, $c$ is categorically distributed, or in other words, it takes form as an one-hot vector of label:
4143

4244
```python
4345
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
@@ -51,7 +53,6 @@ n_z = 2
5153
n_epoch = 20
5254

5355
# Q(z|X,y) -- encoder
54-
5556
X = Input(batch_shape=(m, n_x))
5657
cond = Input(batch_shape=(m, n_y))
5758
```
@@ -70,17 +71,15 @@ Similarly, the decoder is also concatenated with the conditional vector:
7071

7172
```python
7273
def sample_z(args):
73-
mu, log_sigma = args
74-
eps = K.random_normal(shape=(m, n_z), mean=0., std=1.)
75-
return mu + K.exp(log_sigma / 2) \* eps
74+
mu, log_sigma = args
75+
eps = K.random_normal(shape=(m, n_z), mean=0., std=1.)
76+
return mu + K.exp(log_sigma / 2) * eps
7677

7778
# Sample z ~ Q(z|X,y)
78-
7979
z = Lambda(sample_z)([mu, log_sigma])
8080
z_cond = merge([z, cond], mode='concat', concat_axis=1) # <--- NEW!
8181

8282
# P(X|z,y) -- decoder
83-
8483
decoder_hidden = Dense(512, activation='relu')
8584
decoder_out = Dense(784, activation='sigmoid')
8685

@@ -92,35 +91,36 @@ The rest is similar to VAE. Heck, even we don't need to modify the objective. Ev
9291

9392
```python
9493
def vae_loss(y_true, y_pred):
95-
""" Calculate loss = reconstruction loss + KL loss for each data in minibatch """ # E[log P(X|z,y)]
96-
recon = K.sum(K.binary_crossentropy(y_pred, y_true), axis=1) # D_KL(Q(z|X,y) || P(z|X)); calculate in closed form as both dist. are Gaussian
97-
kl = 0.5 \* K.sum(K.exp(log_sigma) + K.square(mu) - 1. - log_sigma, axis=1)
94+
""" Calculate loss = reconstruction loss + KL loss for each data in minibatch """
95+
# E[log P(X|z,y)]
96+
recon = K.sum(K.binary_crossentropy(y_pred, y_true), axis=1)
97+
# D_KL(Q(z|X,y) || P(z|X)); calculate in closed form as both dist. are Gaussian
98+
kl = 0.5 * K.sum(K.exp(log_sigma) + K.square(mu) - 1. - log_sigma, axis=1)
9899

99100
return recon + kl
100-
101101
```
102102

103-
For the full explanation of the code, please refer to my [original VAE post]({% post_url 2016-12-10-variational-autoencoder %}). The full code could be found in my Github repo: <https://github.com/wiseodd/generative-models>.
103+
For the full explanation of the code, please refer to my original VAE post. The full code could be found in my Github repo: https://github.com/wiseodd/generative-models.
104104

105105
## Conditional MNIST
106106

107107
We will test our CVAE model to generate MNIST data, conditioned to its label. With the above model, we could specify which digit we want to generate, as it is conditioned to the label!
108108

109-
First thing first, let's visualize \\( Q(z \vert X, c) \\):
109+
First thing first, let's visualize $Q(z \vert X, c)$:
110110

111-
![Q(z \vert X)]({{ site.baseurl }}/img/2016-12-17-conditional-vae/z_dist_cvae.png)
111+
<BlogImage imagePath='/img/conditional-vae/z_dist_cvae.png' />
112112

113-
Things are messy here, in contrast to VAE's \\( Q(z \vert X) \\), which nicely clusters \\( z \\). But if we look at it closely, we could see that given a specific value of \\( c = y \\), \\( Q(z \vert X, c=y) \\) is roughly \\( N(0, 1) \\)! It's because, if we look at our objective above, we are now modeling \\( P(z \vert c) \\), which we infer variationally with a \\( N(0, 1) \\).
113+
Things are messy here, in contrast to VAE's $Q(z \vert X)$, which nicely clusters $z$. But if we look at it closely, we could see that given a specific value of $c = y$, $Q(z \vert X, c=y)$ is roughly $N(0, 1)$! It's because, if we look at our objective above, we are now modeling $P(z \vert c)$, which we infer variationally with a $N(0, 1)$.
114114

115115
Next, let's try to reconstruct some images:
116116

117-
![Reconstruction]({{ site.baseurl }}/img/2016-12-17-conditional-vae/reconstruction_cvae.png)
117+
<BlogImage imagePath='/img/conditional-vae/reconstruction_cvae.png' fullWidth />
118118

119119
Subjectively, we could say the reconstruction results are way better than the original VAE! We could argue that because each data under specific label has its own distribution, hence it is easy to sample data with a specific label. If we look back at the result of the original VAE, the reconstructions suffer at the edge cases, e.g. when the model is not sure if it's 3, 8, or 5, as they look very similar. No such problem here!
120120

121-
![Generation]({{ site.baseurl }}/img/2016-12-17-conditional-vae/generation_cvae.png)
121+
<BlogImage imagePath='/img/conditional-vae/generation_cvae.png' fullWidth />
122122

123-
Now the interesting part. We could generate a new data under our specific condition. Above, for example, we generate new data which has the label of '5', i.e. \\( c = [0, 0, 0, 0, 0, 1, 0, 0, 0, 0] \\). CVAE make it possible for us to do that.
123+
Now the interesting part. We could generate a new data under our specific condition. Above, for example, we generate new data which has the label of '5', i.e. $c = [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]$. CVAE make it possible for us to do that.
124124

125125
## Conclusion
126126

0 commit comments

Comments
 (0)