Detecting advertisement properties in historical copies of the New York Times and the Atlanta Daily World
- Data (Omitted from Repo): Around 10 million historical New York Times and Atlanta Daily World advertisements, articles, cover pages, etc. represented in XML files. The text of these files were produced through OCR software.
-
- Full Text Data
- Publish Date
- Newspaper Publisher
-
- ProQuest Datathon zip files (ours was split downloaded 11 parts)
- New York Times & Atlanta Daily World Advertisement csv
-
- AdData.csv (Complete csv with 2 mil+ data points of advertisement OCR data)
-
- AdData.csv
-
- TrainingData.csv (1000 observations picked to train Name Entity Recognition Model)$$
-
- Training and Testing Data
-
- Recall
- Precision
- F1-Score
Team: News Diggers
Thank you to Amy Zhu, Hui Wen Goh, Noah Kurrack, Zixiao Chen
