Skip to content

johnolusetire/traffic-data-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Traffic Congestion and Collisions Analysis in Chicago

This project aims to analyze the correlation between traffic congestion and collisions in Chicago using big data techniques. By leveraging large datasets and distributed computing frameworks, we investigate the potential relationship between traffic congestion levels and the frequency or severity of collisions across different city areas.

Data Sources

Methodology

  • Data Processing: We utilize PySpark, a distributed computing framework for big data processing, to handle and transform the large-scale traffic and collision datasets. This includes data cleaning, joining datasets, and performing relevant aggregations and transformations.

  • Exploratory Data Analysis: We conduct exploratory data analysis (EDA) to gain insights into the datasets, identify patterns, and visualize key variables related to traffic congestion and collisions.

  • Spatial Analysis: By leveraging GeoSpatial libraries like Geopandas, we analyze the spatial distribution of traffic congestion and collisions across different neighborhoods or regions in Chicago.

Technologies and Tools

  • PySpark: Distributed computing framework for big data processing.
  • Azure Virtual Machine: Cloud computing platform for running PySpark jobs and managing data.
  • MongoDB: NoSQL database for storing and querying data.
  • Jupyter Notebook: Interactive environment for data analysis and visualization.
  • Geopandas: Python library for working with geospatial data.

Team Members

  • John Olusetire
  • Timothy Obuadey
  • Anand Seshadri

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors