Skip to content

Commit 38a9d53

Browse files
committed
More md -> mdx
1 parent 7882c4a commit 38a9d53

File tree

6 files changed

+148
-121
lines changed

6 files changed

+148
-121
lines changed

.astro/types.d.ts

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -143,13 +143,13 @@ declare module 'astro:content' {
143143
collection: "post";
144144
data: InferEntrySchema<"post">
145145
} & { render(): Render[".mdx"] };
146-
"autoencoders.md": {
147-
id: "autoencoders.md";
146+
"autoencoders.mdx": {
147+
id: "autoencoders.mdx";
148148
slug: "autoencoders";
149149
body: string;
150150
collection: "post";
151151
data: InferEntrySchema<"post">
152-
} & { render(): Render[".md"] };
152+
} & { render(): Render[".mdx"] };
153153
"batchnorm.md": {
154154
id: "batchnorm.md";
155155
slug: "batchnorm";
@@ -199,13 +199,13 @@ declare module 'astro:content' {
199199
collection: "post";
200200
data: InferEntrySchema<"post">
201201
} & { render(): Render[".mdx"] };
202-
"contractive-autoencoder.md": {
203-
id: "contractive-autoencoder.md";
202+
"contractive-autoencoder.mdx": {
203+
id: "contractive-autoencoder.mdx";
204204
slug: "contractive-autoencoder";
205205
body: string;
206206
collection: "post";
207207
data: InferEntrySchema<"post">
208-
} & { render(): Render[".md"] };
208+
} & { render(): Render[".mdx"] };
209209
"conv-probit.mdx": {
210210
id: "conv-probit.mdx";
211211
slug: "conv-probit";
@@ -360,20 +360,20 @@ declare module 'astro:content' {
360360
collection: "post";
361361
data: InferEntrySchema<"post">
362362
} & { render(): Render[".mdx"] };
363-
"levelset-method.md": {
364-
id: "levelset-method.md";
363+
"levelset-method.mdx": {
364+
id: "levelset-method.mdx";
365365
slug: "levelset-method";
366366
body: string;
367367
collection: "post";
368368
data: InferEntrySchema<"post">
369-
} & { render(): Render[".md"] };
370-
"levelset-segmentation.md": {
371-
id: "levelset-segmentation.md";
369+
} & { render(): Render[".mdx"] };
370+
"levelset-segmentation.mdx": {
371+
id: "levelset-segmentation.mdx";
372372
slug: "levelset-segmentation";
373373
body: string;
374374
collection: "post";
375375
data: InferEntrySchema<"post">
376-
} & { render(): Render[".md"] };
376+
} & { render(): Render[".mdx"] };
377377
"lstm-backprop.md": {
378378
id: "lstm-backprop.md";
379379
slug: "lstm-backprop";
@@ -465,13 +465,13 @@ declare module 'astro:content' {
465465
collection: "post";
466466
data: InferEntrySchema<"post">
467467
} & { render(): Render[".md"] };
468-
"residual-net.md": {
469-
id: "residual-net.md";
468+
"residual-net.mdx": {
469+
id: "residual-net.mdx";
470470
slug: "residual-net";
471471
body: string;
472472
collection: "post";
473473
data: InferEntrySchema<"post">
474-
} & { render(): Render[".md"] };
474+
} & { render(): Render[".mdx"] };
475475
"scrapy-long-running.md": {
476476
id: "scrapy-long-running.md";
477477
slug: "scrapy-long-running";
Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,15 @@ publishDate: 2016-12-03 12:20
55
tags: [programming, python, neuralnet]
66
---
77

8-
Consider a neural net. Usually we use it for classification and regression task, that is, given an input vector \\( X \\), we want to find \\( y \\). In other words, we want neural net to find a mapping \\( y = f(X) \\).
8+
Consider a neural net. Usually we use it for classification and regression task, that is, given an input vector $X$, we want to find $y$. In other words, we want neural net to find a mapping $y = f(X)$.
99

10-
Now, what happens if we use the same data as codomain of the function? That is, we want to find a mapping \\( X = f(X) \\). Well, the neural net now will learn an identity mapping of \\( X \\). We probably would ask, how is that useful?
10+
Now, what happens if we use the same data as codomain of the function? That is, we want to find a mapping $X = f(X)$. Well, the neural net now will learn an identity mapping of $X$. We probably would ask, how is that useful?
1111

1212
It turns out, the hidden layer(s) of neural net learns a very interesting respresentation of the data. Hence, we can use the hidden layer representation for many things, for example data compression, dimensionality reduction, and feature learning. This is exactly the last decade idea of Deep Learning: by stacking Autoencoders to learn the representation of data, and train it greedily, hopefully we can train deep net effectively.
1313

1414
## Vanilla Autoencoder
1515

16-
In its simplest form, Autoencoder is a two layer net, i.e. a neural net with one hidden layer. The input and output are the same, and we learn how to reconstruct the input, for example using the \\( \ell\_{2} \\) norm.
16+
In its simplest form, Autoencoder is a two layer net, i.e. a neural net with one hidden layer. The input and output are the same, and we learn how to reconstruct the input, for example using the $\ell_{2}$ norm.
1717

1818
```python
1919
from tensorflow.examples.tutorials.mnist import input_data
@@ -27,8 +27,8 @@ import matplotlib.pyplot as plt
2727
import keras.backend as K
2828
import tensorflow as tf
2929

30-
mnist = input*data.read_data_sets('../data/MNIST_data', one_hot=True)
31-
X, * = mnist.train.images, mnist.train.labels
30+
mnist = input_data.read_data_sets('../data/MNIST_data', one_hot=True)
31+
X, _ = mnist.train.images, mnist.train.labels
3232

3333
inputs = Input(shape=(784,))
3434
h = Dense(64, activation='sigmoid')(inputs)
@@ -57,7 +57,7 @@ model.compile(optimizer='adam', loss='mse')
5757
model.fit(X, X, batch_size=64, nb_epoch=5)
5858
```
5959

60-
Notice in our hidden layer, we added an \\( \ell\_{1} \\) penalty. As a result, the representation is now sparser compared to the vanilla Autoencoder. We could see that by looking at the statistics of the hidden layer. The mean value of vanilla Autoencoder is 0.512477, whereas Sparse Autoencoder 0.148664.
60+
Notice in our hidden layer, we added an $\ell_{1}$ penalty. As a result, the representation is now sparser compared to the vanilla Autoencoder. We could see that by looking at the statistics of the hidden layer. The mean value of vanilla Autoencoder is 0.512477, whereas Sparse Autoencoder 0.148664.
6161

6262
## Multilayer Autoencoder
6363

@@ -104,6 +104,6 @@ The learned representation of Autoencoder can be used for dimensionality reducti
104104

105105
## References
106106

107-
1. <https://en.wikipedia.org/wiki/Autoencoder>
108-
2. <https://blog.keras.io/building-autoencoders-in-keras.html>
107+
1. https://en.wikipedia.org/wiki/Autoencoder
108+
2. https://blog.keras.io/building-autoencoders-in-keras.html
109109
3. Rifai, Salah, et al. "Contractive auto-encoders: Explicit invariance during feature extraction." Proceedings of the 28th international conference on machine learning (ICML-11). 2011.

src/content/post/contractive-autoencoder.md renamed to src/content/post/contractive-autoencoder.mdx

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,19 @@ In the last post, we have seen many different flavors of a family of methods cal
99

1010
The idea of Contractive Autoencoder is to make the learned representation to be robust towards small changes around the training examples. It achieves that by using different penalty term imposed to the representation.
1111

12-
The loss function for the reconstruction term is similar to previous Autoencoders that we have been seen, i.e. using \\( \ell_2 \\) loss. The penalty term, however is more complicated: we need to calculate the representation's jacobian matrix with regards of the training data.
12+
The loss function for the reconstruction term is similar to previous Autoencoders that we have been seen, i.e. using $\ell_2$ loss. The penalty term, however is more complicated: we need to calculate the representation's jacobian matrix with regards of the training data.
1313

1414
Hence, the loss function is as follows:
1515

16-
$$ L = \lVert X - \hat{X} \rVert_2^2 + \lambda \lVert J_h(X) \rVert_F^2 $$
16+
$$
17+
L = \lVert X - \hat{X} \rVert_2^2 + \lambda \lVert J_h(X) \rVert_F^2
18+
$$
1719

1820
in which
1921

20-
$$ \lVert J*h(X) \rVert_F^2 = \sum*{ij} \left( \frac{\partial h_j(X)}{\partial X_i} \right)^2 $$
22+
$$
23+
\lVert J_h(X) \rVert_F^2 = \sum_{ij} \left( \frac{\partial h_j(X)}{\partial X_i} \right)^2
24+
$$
2125

2226
that is, the penalty term is the Frobenius norm of the jacobian matrix, which is the sum squared over all elements inside the matrix. We could think Frobenius norm as the generalization of euclidean norm.
2327

@@ -27,33 +31,31 @@ Let's calculate the jacobian of the hidden layer of our autoencoder then. Let's
2731

2832
$$
2933
\begin{align}
30-
3134
Z_j &= W_i X_i \\[10pt]
3235
h_j &= \phi(Z_j)
33-
3436
\end{align}
3537
$$
3638

37-
where \\( \phi \\) is sigmoid nonlinearity. That is, to get the \\( j\text{-th} \\) hidden unit, we need to get the dot product of the \\( i\text{-th} \\) feature and the corresponding weight. Then using chain rule:
39+
where $\phi$ is sigmoid nonlinearity. That is, to get the $j\text{-th}$ hidden unit, we need to get the dot product of the $i\text{-th}$ feature and the corresponding weight. Then using chain rule:
3840

3941
$$
4042
\begin{align}
41-
4243
\frac{\partial h_j}{\partial X_i} &= \frac{\partial \phi(Z_j)}{\partial X_i} \\[10pt]
4344
&= \frac{\partial \phi(W_i X_i)}{\partial W_i X_i} \frac{\partial W_i X_i}{\partial X_i} \\[10pt]
4445
&= [\phi(W_i X_i)(1 - \phi(W_i X_i))] \, W_{i} \\[10pt]
4546
&= [h_j(1 - h_j)] \, W_i
46-
4747
\end{align}
4848
$$
4949

50-
It looks familiar, doesn't it? Because it's exactly how we calculate gradient. The difference is however, that we treat \\( h(X) \\) as a vector valued function. That is, we treat \\( h\_{i}(X) \\) each as a separate output. Intuitively, let's say for example we have 64 hidden units, then we have 64 function outputs, and so we will have a gradient vector for each of those 64 hidden unit. Hence, when we get the derivative of that hidden layer, what we get instead is a jacobian matrix. And as we now know how to calculate the jacobian, we can calculate the penalty term in our loss.
50+
It looks familiar, doesn't it? Because it's exactly how we calculate gradient. The difference is however, that we treat $h(X)$ as a vector valued function. That is, we treat $h\_{i}(X)$ each as a separate output. Intuitively, let's say for example we have 64 hidden units, then we have 64 function outputs, and so we will have a gradient vector for each of those 64 hidden unit. Hence, when we get the derivative of that hidden layer, what we get instead is a jacobian matrix. And as we now know how to calculate the jacobian, we can calculate the penalty term in our loss.
5151

52-
Let \\( diag(x) \\) be a diagonal matrix, the matrix form of the above derivative is as follows:
52+
Let $diag(x)$ be a diagonal matrix, the matrix form of the above derivative is as follows:
5353

54-
$$ \frac{\partial h}{\partial X} = diag[h(1 - h)] \, W^T $$
54+
$$
55+
\frac{\partial h}{\partial X} = diag[h(1 - h)] \, W^T
56+
$$
5557

56-
We need to form a diagonal matrix of the gradient of \\( h \\) because if we look carefully at the original equation, the first term doesn't depend on \\( i \\). Hence, for all values of \\( W_i \\), we want to multiply it with the correspondent \\( h_j \\). And the nice way to do that is to use diagonal matrix.
58+
We need to form a diagonal matrix of the gradient of $h$ because if we look carefully at the original equation, the first term doesn't depend on $i$. Hence, for all values of $W_i$, we want to multiply it with the correspondent $h_j$. And the nice way to do that is to use diagonal matrix.
5759

5860
As our main objective is to calculate the norm, we could simplify that in our implementation so that we don't need to construct the diagonal matrix:
5961

@@ -73,18 +75,16 @@ Translated to code:
7375
import numpy as np
7476

7577
# Let's say we have minibatch of 32, and 64 hidden units
76-
7778
# Our input is 786 elements vector
78-
7979
X = np.random.randn(32, 786)
8080
W = np.random.randn(786, 64)
8181

8282
Z = np.dot(W, X)
8383
h = sigmoid(Z) # 32x64
8484

8585
Wj_sqr = np.sum(W.T**2, axis=1) # Marginalize i (note the transpose), 64x1
86-
dhj_sqr = (h \* (1 - h))**2 # Derivative of h, 32x64
87-
J_norm = np.sum(dhj_sqr \* Wj_sqr, axis=1) # 32x1, i.e. 1 jacobian norm for each data point
86+
dhj_sqr = (h * (1 - h))**2 # Derivative of h, 32x64
87+
J_norm = np.sum(dhj_sqr * Wj_sqr, axis=1) # 32x1, i.e. 1 jacobian norm for each data point
8888
```
8989

9090
Putting all of those together, we have our full Contractive Autoencoder implemented in Keras:
@@ -103,7 +103,7 @@ outputs = Dense(N, activation='linear')(encoded)
103103
model = Model(input=inputs, output=outputs)
104104

105105
def contractive_loss(y_pred, y_true):
106-
mse = K.mean(K.square(y_true - y_pred), axis=1)
106+
mse = K.mean(K.square(y_true - y_pred), axis=1)
107107

108108
W = K.variable(value=model.get_layer('encoded').get_weights()[0]) # N x N_hidden
109109
W = K.transpose(W) # N_hidden x N
@@ -119,7 +119,7 @@ model.compile(optimizer='adam', loss=contractive_loss)
119119
model.fit(X, X, batch_size=N_batch, nb_epoch=5)
120120
```
121121

122-
And that is it! The full code could be found in my Github repository: <https://github.com/wiseodd/hipsternet>.
122+
And that is it! The full code could be found in my Github repository: https://github.com/wiseodd/hipsternet.
123123

124124
## References
125125

0 commit comments

Comments
 (0)