session 8

Jump to bottom

Matthijs Van keirsbilck edited this page Mar 29, 2017 · 1 revision

Done:

first, extract images from video, face recognition + crop mouth. Then .mat files via Matlab visualization function (pixels, DCT). also contains mouth shape. Probably easier to use a rectangle?
extract phonemes, labels at correct time. Get corresponding mat file, write to new mat file that contains everything (cells of video time, phoneme, image) -> can use this in script that feeds network
alignment seems reasonable (fix: begin and end= very first and very last frame. Others: beginning.
I'll train and see how well it works. If bad -> i've contacted the Eoin Gillen, who made the DB for information)
Caffe installation, convert caffe ResNet model to lasagne, evaluating network. Refactoring done for clarity.
Now can use network for image recognition
list of all classes for input to the network -> reconfigure so output layer 39 neurons.

Busy:

downloading whole dataset (mail to Trinity College). Wrote script, but permissions problem. Part downloaded manually; probably will need to do all.
feeding mat files to network (image + phoneme); retraining FC layer

Questions:

output FC layers only one deep. maybe more? probably testing, and see how well it goes.
Lasagne: training with learning rate = 0 for FC layer.