Hadoop-ApacheSpark-Analysis

This is the project done in collaboration with my colleagues Roberta Pappolla and Lorenzo Ferri. The scope of the project was a simulation of a machine learning/data science project on a big dataset. Thus, a cluster computing framework was used: Hadoop/Apache Spark. Various ML techniques were deployed: Classification, Clustering, Regression, DImensionality Reduction, Feature Engineering, etc.

NOTE: Most notebooks have the comments added in Italian language, sorry for that! I'm available to clarify anything, just get in touch.

Contrbutions are more than welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hadoop-ApacheSpark-Analysis

Show some 💚 by starring this repository!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Hadoop-ApacheSpark-Analysis

Show some 💚 by starring this repository!