Analyzes how covid related posts correspond to world covid test and hospitalization data
Reddit Dataset: https://socialgrep.com/datasets/the-reddit-covid-dataset
Twitter Dataset: https://www.kaggle.com/datasets/imoore/covid19-complete-twitter-dataset-daily-updates?select=dailies
Covid Tests Dataset: https://github.com/owid/covid-19-data/tree/master/public/data/testing
Hospitalizations Dataset: https://github.com/owid/covid-19-data/tree/master/public/data/hospitalizations
Dev Resources
Docker images: https://github.com/Marcel-Jan/docker-hadoop-spark
This is a reposotory for the code which analyzes correlations between Covid-19 Twitter dataset and World Covid statistics.
The First Analysis Approach directory contains code for the first attempt.
Second Analysis Approach directory contains code for the second attempt. (Spark code in README)