Skip to content

Latest commit

 

History

History
57 lines (54 loc) · 2.54 KB

File metadata and controls

57 lines (54 loc) · 2.54 KB
api_or_bulk_downloads Bulk
citation @misc{banda_large-scale_2021, title = {A large-scale {COVID}-19 {Twitter} chatter dataset for open scientific research - an international collaboration}, url = {https://zenodo.org/record/5458943}, abstract = {Version 78 of the dataset...}, urldate = {2021-09-07}, publisher = {Zenodo}, author = {Banda, Juan M. and Tekumalla, Ramya and Wang, Guanyu and Yu, Jingyuan and Liu, Tuo and Ding, Yuning and Artemova, Katya and Tutubalina, Elena and Chowell, Gerardo}, month = sep, year = {2021}, doi = {10.5281/zenodo.5458943}, note = {type: dataset}, keywords = {social media, twitter, nlp, covid-19, covid19}, }
code https://github.com/thepanacealab/covid19_twitter
cost None
description Dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. The first 9 weeks of data (from January 1st, 2020 to March 11th, 2020) contain very low tweet counts as we filtered other data we were collecting for other research purposes, however, one can see the dramatic increase as the awareness for the virus spread. Dedicated data gathering started from March 11th yielding over 4 million tweets a day. The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full dataset, and a cleaned version with no retweets. There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms, the top 1000 bigrams, and the top 1000 trigrams. Some general statistics per day are included for both datasets.
documentation http://www.panacealab.org/covid19/
doi 10.5281/zenodo.5458943
error_metrics
last_edit Mon, 19 Jun 2023 16:41:37 GMT
location https://zenodo.org/record/5595136
maintained_by Panacea Labs, http://www.panacealab.org/covid19/
open_access TRUE
record_creation_timestamp 09/07/2021, 16:35:04
references
related_publications https://doi.org/10.3390/epidemiologia2030024, http://doi.org/10.2196/25108, http://doi.org/10.1002/isaf.1482
shortname covid_twitter_chatter
tags
social media
twitter
nlp
covid-19
covid19
twitter
covid
open-source
terms_of_use
timeframe 2000-2018
title A large-scale COVID-19 Twitter chatter dataset for open scientific research
uuid 1a7fc85d-38af-4fe6-83b8-0d629e85d418
versioning TRUE