Skip to content

session 8

Matthijs Van keirsbilck edited this page Mar 29, 2017 · 1 revision

Done:

  • first, extract images from video, face recognition + crop mouth. Then .mat files via Matlab visualization function (pixels, DCT). also contains mouth shape. Probably easier to use a rectangle?
  • extract phonemes, labels at correct time. Get corresponding mat file, write to new mat file that contains everything (cells of video time, phoneme, image) -> can use this in script that feeds network
  • alignment seems reasonable (fix: begin and end= very first and very last frame. Others: beginning.
    I'll train and see how well it works. If bad -> i've contacted the Eoin Gillen, who made the DB for information)
  • Caffe installation, convert caffe ResNet model to lasagne, evaluating network. Refactoring done for clarity.
    Now can use network for image recognition
  • list of all classes for input to the network -> reconfigure so output layer 39 neurons.

Busy:

  • downloading whole dataset (mail to Trinity College). Wrote script, but permissions problem. Part downloaded manually; probably will need to do all.
  • feeding mat files to network (image + phoneme); retraining FC layer

Questions:

  • output FC layers only one deep. maybe more? probably testing, and see how well it goes.
  • Lasagne: training with learning rate = 0 for FC layer.

Clone this wiki locally