This project aims to develop a predictive model that classifies students into three possible academic outcomes: Graduate, Enrolled, or Dropout.
The goal is to support higher education institutions in identifying students at risk and promoting timely and personalized interventions.
- Problem: Student dropout is a critical issue with long-term personal and institutional consequences. Predicting student outcomes can improve academic support systems and reduce attrition rates.
- Approach: We use supervised machine learning techniques on real-world educational data to build a robust classification model, integrated with explainability methods and a graphical user interface (GUI).
The dataset was sourced from the UCI Machine Learning Repository and includes:
- 4,424 student records
- 37 attributes per student (demographic, socioeconomic, academic)
- Target variable:
Graduate,Enrolled,Dropout
- Preprocessing: Handling categorical and numerical features, standardization, and one-hot encoding.
- Feature Engineering: Creation of informative features (e.g., pass rates, weighted grades, parental education score).
- Handling Imbalance: SMOTE oversampling for minority classes.
- Models Used: Random Forest, CatBoost, XGBoost, LightGBM, SVM, Gradient Boosting, Decision Tree.
- Model Selection: Hyperparameter tuning via HalvingGridSearchCV with 5-fold cross-validation.
- Evaluation: Macro F1-score, Balanced Accuracy, ROC AUC.
We apply SHAP (SHapley Additive Explanations) to:
- Understand the global importance of features across the dataset
- Interpret individual predictions through waterfall plots
- Compare model transparency and consistency
A Python-based Graphical User Interface (built with Tkinter) allows users to:
- Input student data manually
- View predictions and class probabilities
- Access local SHAP explanations for each prediction
├── data/ # Raw and cleaned datasets
├── models/ # Saved trained models
├── shap_output/ # SHAP values
├── notebook/ # Jupyter notebooks
├── utils/ # Preprocessing and feature engineering modules
├── results/ # Model comparison results
├── app.py/ # GUI application
└── README.md
For a detailed explanation of the methodology, results, and model explainability, refer to the full project documentation:
➡️ Documentation_FABIANI.pdf