Skip to content

Latest commit

 

History

History
58 lines (45 loc) · 3.82 KB

File metadata and controls

58 lines (45 loc) · 3.82 KB

MesoNet — Model Notes

What the model does (1–3 paragraphs)

  • Problem it solves: The MesoNet Github repository provides two models, Meso-4 and MesoInception, which specialize on detecting videos with faces edited by DeepFake or Face2Face. More specifically, the models selects frames from a video, extracts the faces, and focus on features such as the eyes to determine the final classification score.

  • Input → output: images (ideally 256x256) → Outputs an array of scores and expected class, which are values between 0.0 and 1.0 (1=real, 0=Fake) Directory of mp4, avi, or mov videos → Dictionary mapping video names to scores between 0.0 and 1.0 (1=real, 0=Fake)

  • Why it's relevant for AI video detection / deepfake detection: MesoNet is provides pretrained weights that are directly trained on a DeepFake dataset, and is specifically made to identify patterns in DeepFake-edited videos. The weights trained on Face2Face also had near-equal detection rates for DeepFake videos as well.

Paper / reference

  • Paper title: MesoNet: a Compact Facial Video Forgery Detection Network
  • Authors / year: Darius Afchar, Vincent Nozick, Junichi Yamagishi, Isao Echizen (4 Sept 2018)
  • Link: https://arxiv.org/abs/1809.00888
  • Key ideas (bullet points):
    • Handles videos after strong degradation from video compression
    • In depth analysis of DeepFake, Face2Face, and their generation process
  • Architecture summary (high-level): MesoNet uses a deep learning approach to detect patterns in edited videos, with a very low number of trainable parameters of around 28,000 for both networks. Meso-4 first performs four layers of convulations, normalization, pooling. MesoInception replaces the first two convulational layers with a variant of the inception module by Szegedy et al (cited in the MesoNet paper). A diagram of the Meso-4 architecture was also provided, and attached below.

What I learned (bullet points)

  • Important details:
    • The model was originally developed in Python 3.5, and must have its own virtual environment to separate package dependencies
    • The detail of eyes strongly help MesoNet determine real videos
    • The detail of the background strongly help determine DeepFake videos
    • DeepFake-edited videos primarily change faces and may keep the background of videos the same, producing videos that are the same as real videos except for the faces.
  • Gotchas / assumptions:
    • Pretrained MesoNet weights are trained on older videos, but present-day DeepFake videos may be more sophisticated.
  • Strengths:
    • Original developers achieved very high detection rates DeepFake and Face2Face, even on each other's test sets.
    • MesoNet has a very low number of trainable parameters compared to many other models.
  • Weaknesses:
    • Will most likely fail to classify videos that do not focus on faces.
    • Requires legacy software and libraries, many of which are no longer supported or do not support newer modules.
    • Completely generated videos, where not only the faces but also the backgrounds were generated, may greatly affect accuracy.

How it should be used in our project

  • Expected preprocessing:
    • An optional training set of images to fine-tune the weights to be more relevant
    • Potentially cutting the video down to a shorter length, or extracting certain frames
  • Expected input format: mp4, avi, or mov files, placed in a known directory
  • Metrics typically reported: A number from 0-1 on the model prediction of fake or real

Screenshots / diagrams (optional)

Figure 4 of the MesoNet paper, a diagram of the network architecture

Open questions

  • Questions to ask in weekly meeting:
  • Things to verify:
    • Can the model functions properly in slightly higherly versions of Python and packages
    • Can the model accurately detect videos generated by present-day DeepFake