Skip to content

Releases: TIBHannover/cross-modal_entity_consistency

Datasets

03 Jun 10:31

Choose a tag to compare

Datasets Pre-release
Pre-release

This repository contains the TamperedNews (Link) and News400 (Link) dataset used in the paper. The datasets include:

  • dataset.jsonl containing:
    • Web links to the news texts
    • Web links to the news image
    • Outputs of the named entity linking and disambiguation (NERD) approach
    • Untampered and tampered entities
  • <entity>.jsonl file for each entity type containing the following information for each entity:
    • Wikidata ID
    • Wikidata label
    • Meta information used for tampering
    • Web links to all reference images crawled from Google, Bing, and Wikidata
  • splits for testing and validation