Skip to content

dorianquelle/Lost-In-Translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data

The Data is available on Zenodo:

FullData.csv.gz: Contains links to all claims in the data-set.

  • publishing_date: Date on which the fact-check was published.
  • claim_date: Date that claim was made.
  • verdict: Rating given by the fact-checking organisation.
  • language: Language of the claim.
  • cluster_{threshold}: ID of the cluster that claim belongs to at all given clusters. Entry "0" means that claim is singleton and not clustered with any other claims.

Embeddings.npy: Contains a dictionary linking each claim to it's embedding calculated with LaBSE.

Code

File Descriptions:

  • 00_FactCheckersMap.ipynb - Creates Maps visualising number of fact-checks and fact-checking organisations per Country.
  • 01_CreateData.ipynb - Parses scraped fact-checks and Data commons fact-check dump. Removes duplicates and cleans data.
  • 02_CleanClaims.ipynb - Data cleaning for the claim entries in the data-set.
  • 03_EmbeddClaims.ipynb - Implements embedding of claims and structures the data for similarity comparison using Annoy indexing. Exports Edge-List of similar claims.
  • 04_Clustering.ipynb - Creates Clusters of the most similar claims by threshold and runs analysis.
  • 05_Translate.ipynb - Translates all claims to english.
  • 06_Tokenanalysis.ipynb - Analyzes token usage to identify and differentiate long-lasting and transient terms in claims across languages.

About

Code for paper: "Lost in Translation: Using Global Fact-Checks to Measure Multilingual Misinformation Prevalence, Spread, and Evolution"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors