Skip to content

Latest commit

 

History

History
19 lines (11 loc) · 691 Bytes

File metadata and controls

19 lines (11 loc) · 691 Bytes

Hadoop-ApacheSpark-Analysis

This is the project done in collaboration with my colleagues Roberta Pappolla and Lorenzo Ferri. The scope of the project was a simulation of a machine learning/data science project on a big dataset. Thus, a cluster computing framework was used: Hadoop/Apache Spark. Various ML techniques were deployed: Classification, Clustering, Regression, DImensionality Reduction, Feature Engineering, etc.

NOTE: Most notebooks have the comments added in Italian language, sorry for that! I'm available to clarify anything, just get in touch.


Contrbutions are more than welcome!

Show some 💚 by starring this repository!