-
Notifications
You must be signed in to change notification settings - Fork 18
session 8
Matthijs Van keirsbilck edited this page Mar 29, 2017
·
1 revision
- first, extract images from video, face recognition + crop mouth. Then .mat files via Matlab visualization function (pixels, DCT). also contains mouth shape. Probably easier to use a rectangle?
- extract phonemes, labels at correct time. Get corresponding mat file, write to new mat file that contains everything (cells of video time, phoneme, image) -> can use this in script that feeds network
- alignment seems reasonable (fix: begin and end= very first and very last frame. Others: beginning.
I'll train and see how well it works. If bad -> i've contacted the Eoin Gillen, who made the DB for information) - Caffe installation, convert caffe ResNet model to lasagne, evaluating network. Refactoring done for clarity.
Now can use network for image recognition - list of all classes for input to the network -> reconfigure so output layer 39 neurons.
- downloading whole dataset (mail to Trinity College). Wrote script, but permissions problem. Part downloaded manually; probably will need to do all.
- feeding mat files to network (image + phoneme); retraining FC layer
- output FC layers only one deep. maybe more? probably testing, and see how well it goes.
- Lasagne: training with learning rate = 0 for FC layer.