Skip to content

Repository for DS-GA 1003 Machine Learning project at NYU: Predicting food safety violations in New York City

Notifications You must be signed in to change notification settings

jchelmers/urban-data-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

urban-data-project

Group members:

  • Seda Bilaloglu
  • Julie Helmers
  • Jonathan Toy

This repo contains our code and data files for the final project for the DS-GA 1003 Machine Learning and Computational Statistics course taught by Professor David Rosenberg (NYU, Spring 2017). We were advised by Dr. Bonnie Ray.

For our project, we classified New York City restaurants according to their probability of exhibiting two or more critical food safety violations on their next inspection, using data from the New York Department of Health and Mental Hygiene, 311 Services, Department of Consumer Affairs, New York State Liquor Authority, Weather Underground, OpenStreetMap, and Google Places.

Our data sources and other references can be found in the Predicting Food Safety Violations Report.pdf document


The subfolders contain READMEs with instructions on how to replicate our results. Please preserve the directory structure of the GitHub when cloning/downloading the code and data files; otherwise, you will need to modify the relative paths in several places in our code. Also keep in mind that almost all of the Jupyter notebooks generate CSVs that may overwrite the versions you have downloaded from the GitHub. Please comment out the relevant lines (search for the function 'pd.to_csv') if you would like to use our versions of the files instead.

Before running the Preprocess_Restaurants.ipynb Jupyter notebook, which extracts violation/inspection history-related features from the New York DOHMH dataset of food safety violations, please download and UNZIP the following file:

This notebook generates the following CSVs:

  • prior_violations.csv in the working directory
  • health_inspect_cleaned.csv in a subdirectory named Heatmap

Before running the Modeling Results.ipynb Jupyter notebook, which produces our final modeling results, please download the following files:

This notebook generates a CSV named GBC_testset_w_confusionmatrix.csv in the working directory.

About

Repository for DS-GA 1003 Machine Learning project at NYU: Predicting food safety violations in New York City

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •