You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have seen the Generative Adversarial Nets (GAN) model in [the previous post]({% post_url 2016-09-17-gan-tensorflow %}). We have also seen the arch nemesis of GAN, the VAE and its conditional variation: Conditional VAE (CVAE). Hence, it is only proper for us to study conditional variation of GAN, called Conditional GAN or CGAN for short.
We have seen the Generative Adversarial Nets (GAN) model in the previous post. We have also seen the arch nemesis of GAN, the VAE and its conditional variation: Conditional VAE (CVAE). Hence, it is only proper for us to study conditional variation of GAN, called Conditional GAN or CGAN for short.
9
11
10
12
## CGAN: Formulation and Architecture
11
13
12
-
Recall, in GAN, we have two neural nets: the generator \\( G(z)\\) and the discriminator \\( D(X)\\). Now, as we want to condition those networks with some vector \\( y \\), the easiest way to do it is to feed \\( y \\) into both networks. Hence, our generator and discriminator are now \\( G(z, y)\\) and \\( D(X, y)\\) respectively.
14
+
Recall, in GAN, we have two neural nets: the generator $G(z)$ and the discriminator $D(X)$. Now, as we want to condition those networks with some vector $y$, the easiest way to do it is to feed $y$ into both networks. Hence, our generator and discriminator are now $G(z, y)$ and $D(X, y)$ respectively.
13
15
14
-
We can see it with a probabilistic point of view. \\( G(z, y)\\) is modeling the distribution of our data, given \\( z \\) and \\( y \\), that is, our data is generated with this scheme \\( X \sim G(X \, \vert \, z, y)\\).
16
+
We can see it with a probabilistic point of view. $G(z, y)$ is modeling the distribution of our data, given $z$ and $y$, that is, our data is generated with this scheme $X \sim G(X \, \vert \, z, y)$.
15
17
16
-
Likewise for the discriminator, now it tries to find discriminating label for \\( X \\) and \\( X_G\\), that are modeled with \\( d \sim D(d \, \vert \, X, y)\\).
18
+
Likewise for the discriminator, now it tries to find discriminating label for $X$ and $X_G$, that are modeled with $d \sim D(d \, \vert \, X, y)$.
17
19
18
-
Hence, we could see that both \\( D \\) and \\( G \\) is jointly conditioned to two variables \\( z \\) or \\( X \\) and \\( y \\).
20
+
Hence, we could see that both $D$ and $G$ is jointly conditioned to two variables $z$ or $X$ and $y$.
In contrast with the architecture of GAN, we now has an additional input layer in both discriminator net and generator net.
36
38
37
39
## CGAN: Implementation in TensorFlow
38
40
39
-
I'd like to direct the reader to the [previous post about GAN]({% post_url 2016-09-17-gan-tensorflow %}), particularly for the implementation in TensorFlow. Implementing CGAN is so simple that we just need to add a handful of lines to the original GAN implementation. So, here we will only look at those modifications.
41
+
Implementing CGAN is so simple that we just need to add a handful of lines to the original GAN implementation. So, here we will only look at those modifications.
40
42
41
43
The first additional code for CGAN is here:
42
44
@@ -49,38 +51,37 @@ We are adding new input to hold our variable we are conditioning our CGAN to.
49
51
Next, we add it to both our generator net and discriminator net:
50
52
51
53
```python
52
-
defgenerator(z, y): # Concatenate z and y
53
-
inputs = tf.concat(concat_dim=1, values=[z, y])
54
+
defgenerator(z, y):
55
+
# Concatenate z and y
56
+
inputs = tf.concat(concat_dim=1, values=[z, y])
54
57
55
58
G_h1 = tf.nn.relu(tf.matmul(inputs, G_W1) + G_b1)
56
59
G_log_prob = tf.matmul(G_h1, G_W2) + G_b2
57
60
G_prob = tf.nn.sigmoid(G_log_prob)
58
61
59
62
return G_prob
60
63
61
-
defdiscriminator(x, y): # Concatenate x and y
62
-
inputs = tf.concat(concat_dim=1, values=[x, y])
64
+
65
+
defdiscriminator(x, y):
66
+
# Concatenate x and y
67
+
inputs = tf.concat(concat_dim=1, values=[x, y])
63
68
64
69
D_h1 = tf.nn.relu(tf.matmul(inputs, D_W1) + D_b1)
65
70
D_logit = tf.matmul(D_h1, D_W2) + D_b2
66
71
D_prob = tf.nn.sigmoid(D_logit)
67
72
68
73
return D_prob, D_logit
69
-
70
74
```
71
75
72
-
The problem we have here is how to incorporate the new variable \\( y \\) into \\( D(X)\\) and \\( G(z)\\). As we are trying to model the joint conditional, the simplest way to do it is to just concatenate both variables. Hence, in \\( G(z, y)\\), we are concatenating \\( z \\) and \\( y \\) before we feed it into the networks. The same procedure is applied to \\( D(X, y)\\).
76
+
The problem we have here is how to incorporate the new variable $y$ into $D(X)$ and $G(z)$. As we are trying to model the joint conditional, the simplest way to do it is to just concatenate both variables. Hence, in $G(z, y)$, we are concatenating $z$ and $y$ before we feed it into the networks. The same procedure is applied to $D(X, y)$.
73
77
74
-
Of course, as our inputs for \\( D(X, y)\\) and \\( G(z, y)\\) is now different than the original GAN, we need to modify our weights:
78
+
Of course, as our inputs for $D(X, y)$ and $G(z, y)$ is now different than the original GAN, we need to modify our weights:
75
79
76
80
```python
77
-
78
81
# Modify input to hidden weights for discriminator
79
-
80
82
D_W1= tf.Variable(shape=[X_dim + y_dim, h_dim]))
81
83
82
84
# Modify input to hidden weights for generator
83
-
84
85
G_W1= tf.Variable(shape=[Z_dim + y_dim, h_dim]))
85
86
```
86
87
@@ -89,25 +90,23 @@ That is, we just adjust the dimensionality of our weights.
89
90
Next, we just use our new networks:
90
91
91
92
```python
92
-
93
93
# Add additional parameter y into all networks
94
-
95
94
G_sample = generator(Z, y)
96
95
D_real, D_logit_real = discriminator(X, y)
97
96
D_fake, D_logit_fake = discriminator(G_sample, y)
98
97
```
99
98
100
-
And finally, when training, we also feed the value of \\( y \\) into the networks:
99
+
And finally, when training, we also feed the value of $y$ into the networks:
Above, we just sample \\( z \\), and then construct the conditional variables. In our example case, the conditional variables is a collection of one-hot vectors with value 1 in the 5th index. The last thing we need to is to run the network with those variables as inputs.
126
+
Above, we just sample $z$, and then construct the conditional variables. In our example case, the conditional variables is a collection of one-hot vectors with value 1 in the 5th index. The last thing we need to is to run the network with those variables as inputs.
Copy file name to clipboardExpand all lines: src/content/post/conditional-vae.mdx
+26-26Lines changed: 26 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,9 @@ publishDate: 2016-12-17 11:04
5
5
tags: [programming, python, neuralnet]
6
6
---
7
7
8
-
Conditional Variational Autoencoder (CVAE) is an extension of [Variational Autoencoder (VAE)]({% post_url 2016-12-10-variational-autoencoder %}), a generative model that we have studied in the last post. We've seen that by formulating the problem of data generation as a bayesian model, we could optimize its variational lower bound to learn the model.
Conditional Variational Autoencoder (CVAE) is an extension of Variational Autoencoder (VAE), a generative model that we have studied in the last post. We've seen that by formulating the problem of data generation as a bayesian model, we could optimize its variational lower bound to learn the model.
9
11
10
12
However, we have no control on the data generation process on VAE. This could be problematic if we want to generate some specific data. As an example, suppose we want to convert a unicode character to handwriting. In vanilla VAE, there is no way to generate the handwriting based on the character that the user inputted. Concretely, suppose the user inputted character '2', how do we generate handwriting image that is a character '2'? We couldn't.
11
13
@@ -17,27 +19,27 @@ Recall, on VAE, the objective is:
that is, we want to optimize the log likelihood of our data \\( P(X)\\) under some "encoding" error. The original VAE model has two parts: the encoder \\( Q(z \vert X)\\) and the decoder \\( P(X \vert z)\\).
22
+
that is, we want to optimize the log likelihood of our data $P(X)$ under some "encoding" error. The original VAE model has two parts: the encoder $Q(z \vert X)$ and the decoder $P(X \vert z)$.
21
23
22
-
Looking closely at the model, we could see why can't VAE generate specific data, as per our example above. It's because the encoder models the latent variable \\( z \\) directly based on \\( X \\), it doesn't care about the different type of \\( X \\). For example, it doesn't take any account on the label of \\( X \\).
24
+
Looking closely at the model, we could see why can't VAE generate specific data, as per our example above. It's because the encoder models the latent variable $z$ directly based on $X$, it doesn't care about the different type of $X$. For example, it doesn't take any account on the label of $X$.
23
25
24
-
Similarly, in the decoder part, it only models \\( X \\) directly based on the latent variable \\( z \\).
26
+
Similarly, in the decoder part, it only models $X$ directly based on the latent variable $z$.
25
27
26
-
We could improve VAE by conditioning the encoder and decoder to another thing(s). Let's say that other thing is \\( c \\), so the encoder is now conditioned to two variables \\( X \\) and \\( c \\): \\( Q(z \vert X, c)\\). The same with the decoder, it's now conditioned to two variables \\( z \\) and \\( c \\): \\( P(X \vert z, c)\\).
28
+
We could improve VAE by conditioning the encoder and decoder to another thing(s). Let's say that other thing is $c$, so the encoder is now conditioned to two variables $X$ and $c$: $Q(z \vert X, c)$. The same with the decoder, it's now conditioned to two variables $z$ and $c$: $P(X \vert z, c)$.
27
29
28
30
Hence, our variational lower bound objective is now in this following form:
i.e. we just conditioned all of the distributions with a variable \\( c \\).
34
+
i.e. we just conditioned all of the distributions with a variable $c$.
33
35
34
-
Now, the real latent variable is distributed under \\( P(z \vert c )\\). That is, it's now a conditional probability distribution (CPD). Think about it like this: for each possible value of \\( c \\), we would have a \\( P(z)\\). We could also use this form of thinking for the decoder.
36
+
Now, the real latent variable is distributed under $P(z \vert c )$. That is, it's now a conditional probability distribution (CPD). Think about it like this: for each possible value of $c$, we would have a $P(z)$. We could also use this form of thinking for the decoder.
35
37
36
38
## CVAE: Implementation
37
39
38
-
The conditional variable \\( c \\) could be anything. We could assume it comes from a categorical distribution expressing the label of our data, gaussian expressing some regression target, or even the same distribution as the data (e.g. for image inpainting: conditioning the model to incomplete image).
40
+
The conditional variable $c$ could be anything. We could assume it comes from a categorical distribution expressing the label of our data, gaussian expressing some regression target, or even the same distribution as the data (e.g. for image inpainting: conditioning the model to incomplete image).
39
41
40
-
Let's use MNIST for example. We could use the label as our conditional variable \\( c \\). In this case, \\( c \\) is categorically distributed, or in other words, it takes form as an one-hot vector of label:
42
+
Let's use MNIST for example. We could use the label as our conditional variable $c$. In this case, $c$ is categorically distributed, or in other words, it takes form as an one-hot vector of label:
For the full explanation of the code, please refer to my [original VAE post]({% post_url 2016-12-10-variational-autoencoder %}). The full code could be found in my Github repo: <https://github.com/wiseodd/generative-models>.
103
+
For the full explanation of the code, please refer to my original VAE post. The full code could be found in my Github repo: https://github.com/wiseodd/generative-models.
104
104
105
105
## Conditional MNIST
106
106
107
107
We will test our CVAE model to generate MNIST data, conditioned to its label. With the above model, we could specify which digit we want to generate, as it is conditioned to the label!
108
108
109
-
First thing first, let's visualize \\( Q(z \vert X, c)\\):
109
+
First thing first, let's visualize $Q(z \vert X, c)$:
Things are messy here, in contrast to VAE's \\( Q(z \vert X)\\), which nicely clusters \\( z \\). But if we look at it closely, we could see that given a specific value of \\( c = y\\), \\( Q(z \vert X, c=y)\\) is roughly \\( N(0, 1)\\)! It's because, if we look at our objective above, we are now modeling \\( P(z \vert c)\\), which we infer variationally with a \\( N(0, 1)\\).
113
+
Things are messy here, in contrast to VAE's $Q(z \vert X)$, which nicely clusters $z$. But if we look at it closely, we could see that given a specific value of $c = y$, $Q(z \vert X, c=y)$ is roughly $N(0, 1)$! It's because, if we look at our objective above, we are now modeling $P(z \vert c)$, which we infer variationally with a $N(0, 1)$.
Subjectively, we could say the reconstruction results are way better than the original VAE! We could argue that because each data under specific label has its own distribution, hence it is easy to sample data with a specific label. If we look back at the result of the original VAE, the reconstructions suffer at the edge cases, e.g. when the model is not sure if it's 3, 8, or 5, as they look very similar. No such problem here!
Now the interesting part. We could generate a new data under our specific condition. Above, for example, we generate new data which has the label of '5', i.e. \\( c = [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]\\). CVAE make it possible for us to do that.
123
+
Now the interesting part. We could generate a new data under our specific condition. Above, for example, we generate new data which has the label of '5', i.e. $c = [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]$. CVAE make it possible for us to do that.
0 commit comments