Exploratory data analysis of NYC parking violations using Pandas and several visualization libraries This Jupyter notebook can also be found on Jovian
Since 2012 New York City has made data publically available at NYC Open Data Of the many data sets available is Parking violatons data from 2014 through present (2021). In addition to it being on the NYC Open data site it can also be found on kaggle. The fill data set contains 4 years of tickets with 42.3 Million rows of data.
It turns out that NYC issues over 10 Million parking tickets every year!
This project analyzes the fiscal year of 2017, which runs from July 1st, 2016 through June 30th, 2017. one year alone is nearly 2GB of data, and the notebook is intended to be run with a lot of memory. I have used Google Colab.
This project used Pandas, numpy, opendatasets as well as several python visualization packages to make insights on this data. The visualization libraries are:
- Matplotlib
- Seaborn
- Plotly
- Folium
I will be back to incorperate the other years up to and including 2021. this will involve the use of Dask for handling even larger dataframes. This page will be updated accordingly.
Best
😎 Sam