Fundamentals of Data Mining Project
- Developed an end-to-end machine learning solution for predicting automobile loan defaults.
- Performed comprehensive data preprocessing techniques, including null value handling, dimensionality reduction, outlier treatment, class unbalancing, filling missing data, removing duplicates, and data
normalization. Conducted data visualization to uncover key insights and trends in the dataset, aiding in feature selection and understanding data distributions. 3)Trained six machine learning models: Random Forest, CatBoost, LightGBM, GaussianNB, Logistic Regression, and XGBoost. - Selected Random Forest as the best-performing model after extensive hyperparameter tuning using randomised search.
- Achieved improved prediction accuracy and addressed key challenges related to imbalanced data.
** We applied 3 XAI techniques to Random Forest Classifier, our final model, to gain insight into how the model understands the data and how each feature influences the model's output. They are Feature importance, Permutation Feature importance and Partial Dependancy plots
Libraries -: scikit-learn, pandas, numpy, seaborn, matplotlib, streamlit
Deployment link -: https://automobile-loan-default-prediction-system-xujqfzrxxmbapkvhz3hu.streamlit.app/