Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
candidates.md		candidates.md
selected.md		selected.md

README.md

Dataset Evaluation Guide

How to Evaluate Dataset Candidates

When researching potential datasets for deepfake/manipulated video detection, document each candidate using the table format in candidates.md.

Fields to Document

Field	Description
Name	Dataset name
Link	URL to dataset page or paper
License/Terms	Usage restrictions, research-only, commercial OK, etc.
Size	Number of videos/images, total GB if known
Labels	What labels are provided (real/fake, manipulation type, etc.)
Modality	Video, frames, audio, or combination
Split Availability	Does it provide train/val/test splits?
Preprocessing Needed	Face extraction, resizing, frame sampling, etc.
Known Issues	Quality problems, label noise, download difficulty
Why It Fits	Relevance to our project goals

Evaluation Criteria

When comparing candidates, prioritize:

License compatibility — Can we use it for the intended research/deployment context?
Label quality — Are labels reliable and well-documented?
Size — Large enough for meaningful experiments, small enough to handle
Relevance — Does it cover the manipulation types we want to detect?
Accessibility — Can we actually download and use it?

Files in This Folder

candidates.md — Table of all datasets considered
selected.md — Documentation for the chosen dataset