Supervised-learning-with-heterogenous-data-using-Random-Forest-algorithm

This was a group project where we are comparing the effectiveness of supervised learning using various multivariate data sets and i was involved doing so using Random Forest Model. I implemented the feature importance of various predictor variables and how it effects the error rate(RMSE). I used the Student Performance Dataset to show how the importance of various predictor variables. I implemented it in Python using various libraries like Numpy, Scipy, Scikit-learn, pandas, matplotlib and seaborn packages for plotting the figures.

Datasets used: 1. Wine Quality http://archive.ics.uci.edu/ml/datasets/Wine+Quality 2. Student Performance http://archive.ics.uci.edu/ml/datasets/Student+Performance 3. Adult Dataset https://archive.ics.uci.edu/ml/datasets/Adult 4. http://archive.ics.uci.edu/ml/datasets/forest+fires

We also used the Gaussian mixture model GMM Sampling algorithm to create sampling data of various dataset mentioned above and use on the model implemented and test its results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supervised-learning-with-heterogenous-data-using-Random-Forest-algorithm

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Supervised-learning-with-heterogenous-data-using-Random-Forest-algorithm