Skip to content

Latest commit

 

History

History
9 lines (7 loc) · 1.12 KB

File metadata and controls

9 lines (7 loc) · 1.12 KB

Supervised-learning-with-heterogenous-data-using-Random-Forest-algorithm

This was a group project where we are comparing the effectiveness of supervised learning using various multivariate data sets and i was involved doing so using Random Forest Model. I implemented the feature importance of various predictor variables and how it effects the error rate(RMSE). I used the Student Performance Dataset to show how the importance of various predictor variables. I implemented it in Python using various libraries like Numpy, Scipy, Scikit-learn, pandas, matplotlib and seaborn packages for plotting the figures.

Datasets used: 1. Wine Quality http://archive.ics.uci.edu/ml/datasets/Wine+Quality 2. Student Performance http://archive.ics.uci.edu/ml/datasets/Student+Performance 3. Adult Dataset https://archive.ics.uci.edu/ml/datasets/Adult 4. http://archive.ics.uci.edu/ml/datasets/forest+fires

We also used the Gaussian mixture model GMM Sampling algorithm to create sampling data of various dataset mentioned above and use on the model implemented and test its results.