A project to represent and visualize COVID-19 information in Wikidata
In this research project, we investigate the ability of open license knowledge graphs to represent COVID-19 information in a fully structured format and to visualize a synthesis of the obtained information using SPARQL. Our study mainly regards the evaluation of this assumption for COVID-19 information in Wikidata. This GitHub repository involves two folders including information supporting the results of this work:
- "Docs": This folder includes the source data of several figures and tables of the study
- Table 3: Languages ranked according to various variables, based on Wikidata queries (as of August 11, 2020). The Medical Wikipedia query yields Wikipedia articles associated with Wikidata items that have a Disease Ontology ID (P699) or are in the tree of any of the following classes: medicine (Q11190), disease (Q12136), medical procedure (Q796194) or medication (Q12140). The Medical Wikidata labels query yields labels of Wikidata items that have a Disease Ontology ID (P699) or a MeSH Desccriptor ID (P486) or are in the tree of any of the same four classes. The Wikidata users column provides a snapshot from the Wikidata dashboard that lists Wikidata users who also edit Wikipedia by number of such users per Wikipedia language. Style code: Italic for languages appearing in all four lists; bold for those appearing in only one.
- Table 4: Languages ranked according to various COVID-19-related variables (as of August 13, 2020). The COVID Wikidata content query sorts languages by the number of labels of Wikidata items with a direct link to and/or from any of the core COVID-19 items - Q84263196 (COVID-19), Q81068910 (COVID-19 pandemic) and Q82069695 (SARS-CoV-2) - excluding items about humans (3131) or scholarly publications (40164). The COVID Wikipedia pages query filters those Wikidata items for associated Wikipedia articles and sorts languages by the number of such articles. The values in the COVID Wikipedia edits column represent the revision counts per Wikipedia language as taken from the dashboard listing Wikimedia projects by total number of revisions to COVID-19-related articles. The COVID-19 pandemic Wikipedia pageviews column represents daily average user traffic (averaged since January 1, 2020) to the article about the COVID-19 pandemic in the respective language. Style code: Italic for languages appearing in all four lists; bold for those appearing in only one.
- Tables 5 to 8: List of the mostly used external identifiers for each class of COVID-19-related Wikidata items.
- Fig 8B: Co-occurrence of topics in publications with one of the covid-related items as a topic, with ribbon widths proportional to the number of publications sharing those topics (log scale). Topics coloured by group as determined by louvain clustering, topics shared in fewer than 5 publications omitted.
- Archive-URL: Internet archive links for the URLs cited by "Representing COVID-19 information in collaborative knowledge graphs: a study of Wikidata" are made available thanks to ArchiveNow.
- "Query": This folder involves sample SPARQL queries developed for the visualization of COVID-19 information in Wikidata. These SPARQL queries are visualized at https://speed.ieee.tn.
Turki, H., Hadj Taieb, M. A., Shafee, T., Lubiana, T., Jemielniak, D., Ben Aouicha, M., Labra Gayo, J. E., Banat, M., Das, D., & Mietchen, D. (2020). Representing COVID-19 information in collaborative knowledge graphs: a study of Wikidata. Zenodo. doi:10.5281/zenodo.4028482.
All statistical data and SPARQL queries are released under CC0 License. This license allows the free reuse of the released data without any copyright restrictions.