Skip to content

Latest commit

 

History

History
21 lines (18 loc) · 1.35 KB

File metadata and controls

21 lines (18 loc) · 1.35 KB

Data

The Data is available on Zenodo:

FullData.csv.gz: Contains links to all claims in the data-set.

  • publishing_date: Date on which the fact-check was published.
  • claim_date: Date that claim was made.
  • verdict: Rating given by the fact-checking organisation.
  • language: Language of the claim.
  • cluster_{threshold}: ID of the cluster that claim belongs to at all given clusters. Entry "0" means that claim is singleton and not clustered with any other claims.

Embeddings.npy: Contains a dictionary linking each claim to it's embedding calculated with LaBSE.

Code

File Descriptions:

  • 00_FactCheckersMap.ipynb - Creates Maps visualising number of fact-checks and fact-checking organisations per Country.
  • 01_CreateData.ipynb - Parses scraped fact-checks and Data commons fact-check dump. Removes duplicates and cleans data.
  • 02_CleanClaims.ipynb - Data cleaning for the claim entries in the data-set.
  • 03_EmbeddClaims.ipynb - Implements embedding of claims and structures the data for similarity comparison using Annoy indexing. Exports Edge-List of similar claims.
  • 04_Clustering.ipynb - Creates Clusters of the most similar claims by threshold and runs analysis.
  • 05_Translate.ipynb - Translates all claims to english.
  • 06_Tokenanalysis.ipynb - Analyzes token usage to identify and differentiate long-lasting and transient terms in claims across languages.