"Analysis of the factors contributing to changes in property sales and crime rates in New York City" involves a big data analytical project utilizing a distributed computing system architecture. The system uses a network file system (NFS) to load data onto the system and has a master-slave architecture with parallel processing. PySpark, an open-source distributed computing system, is used to process the data in parallel across multiple nodes. PySpark is built on Apache Spark, an open-source big data processing framework that supports distributed data processing. The project aims to analyze the correlation between the population of a neighborhood, crime rate, and average home sale price in New York City. It also identifies the most commonly reported crimes in the city and provides recommendations to prevent them. Finally, the project analyzes the relationship between unemployment and crime rate in New York City. To visualize the data, Tableau, a data visualization tool, is used. The tool connects to the distributed computing system and retrieves the data processed by PySpark. Tableau provides interactive dashboards and visualizations that enable users to explore the data and gain insights. The findings of the project can be used by businesses and individuals who are looking to invest in real estate or develop new properties. In summary, the project uses a distributed computing system architecture with PySpark as the distributed computing system and Tableau as the data visualization tool. The project analyzes the correlation between population, crime rate, and average home sale price in New York City, identifies the most commonly reported crimes, and analyzes the relationship between unemployment and crime rate. The findings of the project can be used by businesses and individuals for real estate investments and property development.
anuragswarup/Big-data-Analytics-using-New-York-Dataset
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|