-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
Building autoencoders for images involves constructing a neural network architecture that effectively (hopefully) extracts the latent features of the image into a "compressed" smaller dimension (the encoded layer). This also necessitates training the model to "extract" features ... which is a problem that has been solved in other models (e.g. VGG16). Thus one is potentially "reinventing the wheel" so to speak.
Code
Example using VGG16:
# get ae lib
import math
from keras.applications import VGG16
from keras.models import Model
from keras.layers import Dense, Flatten, Reshape
from keras.layers import Conv2D, UpSampling2D
# calculate dims
RESHAPE_DIM = math.prod(RESIZE_IMG_DIMS)
ENCODE_DIM = int(RESHAPE_DIM / 24.5)
# Load VGG16 model without top layers
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
# Freeze the layers of the VGG16 model
for layer in base_model.layers:
layer.trainable = False
# Encoder layers
encoder_output = Flatten()(base_model.output)
encoder_output = Dense(512, activation='relu')(encoder_output)
encoder_output = Dense(256, activation='relu')(encoder_output)
# Decoder layers
decoder_output = Dense(512, activation='relu')(encoder_output)
decoder_output = Dense(25088, activation='relu')(decoder_output)
decoder_output = Reshape((7, 7, 512))(decoder_output)
# Reverse VGG16 layers
reverse_vgg16 = Conv2D(512, (3, 3), activation='relu', padding='same')(decoder_output)
reverse_vgg16 = Conv2D(512, (3, 3), activation='relu', padding='same')(reverse_vgg16)
reverse_vgg16 = UpSampling2D((2, 2))(reverse_vgg16)
reverse_vgg16 = Conv2D(512, (3, 3), activation='relu', padding='same')(reverse_vgg16)
reverse_vgg16 = Conv2D(512, (3, 3), activation='relu', padding='same')(reverse_vgg16)
reverse_vgg16 = UpSampling2D((2, 2))(reverse_vgg16)
reverse_vgg16 = Conv2D(256, (3, 3), activation='relu', padding='same')(reverse_vgg16)
reverse_vgg16 = Conv2D(256, (3, 3), activation='relu', padding='same')(reverse_vgg16)
reverse_vgg16 = UpSampling2D((2, 2))(reverse_vgg16)
reverse_vgg16 = Conv2D(128, (3, 3), activation='relu', padding='same')(reverse_vgg16)
reverse_vgg16 = Conv2D(128, (3, 3), activation='relu', padding='same')(reverse_vgg16)
reverse_vgg16 = UpSampling2D((2, 2))(reverse_vgg16)
# Additional Upsampling layers
reverse_vgg16 = UpSampling2D((2, 2))(reverse_vgg16)
# Output layer
output_layer = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(reverse_vgg16)
# Create the reverse VGG16 model
autoencoder = Model(inputs=base_model.input, outputs=output_layer)
# check network topology
autoencoder.summary()Solution
Can actually build an autoencoder where the initial input is passed through a model (like VGG16) and the "output layer" of the initial model (which is the extracted features) is passed through a "bottleneck" (i.e. the autoencoder). It is at this point that the actual "compression" occurs. The final step is to "decompress" and to "reversevgg16" (i.e. apply the inverse transformation provided by the VGG16 model) until the original input image is reconstructed.