When researching potential datasets for deepfake/manipulated video detection, document each candidate using the table format in candidates.md.
| Field | Description |
|---|---|
| Name | Dataset name |
| Link | URL to dataset page or paper |
| License/Terms | Usage restrictions, research-only, commercial OK, etc. |
| Size | Number of videos/images, total GB if known |
| Labels | What labels are provided (real/fake, manipulation type, etc.) |
| Modality | Video, frames, audio, or combination |
| Split Availability | Does it provide train/val/test splits? |
| Preprocessing Needed | Face extraction, resizing, frame sampling, etc. |
| Known Issues | Quality problems, label noise, download difficulty |
| Why It Fits | Relevance to our project goals |
When comparing candidates, prioritize:
- License compatibility — Can we use it for the intended research/deployment context?
- Label quality — Are labels reliable and well-documented?
- Size — Large enough for meaningful experiments, small enough to handle
- Relevance — Does it cover the manipulation types we want to detect?
- Accessibility — Can we actually download and use it?
candidates.md— Table of all datasets consideredselected.md— Documentation for the chosen dataset