Skip to content

Predictive modeling of student academic outcomes using supervised learning and explainable AI - DMML project

License

Notifications You must be signed in to change notification settings

martiFabia/Academic-Outcome-Predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predicting Student Dropout in Higher Education Using Supervised Learning

This project aims to develop a predictive model that classifies students into three possible academic outcomes: Graduate, Enrolled, or Dropout.
The goal is to support higher education institutions in identifying students at risk and promoting timely and personalized interventions.

📚 Overview

  • Problem: Student dropout is a critical issue with long-term personal and institutional consequences. Predicting student outcomes can improve academic support systems and reduce attrition rates.
  • Approach: We use supervised machine learning techniques on real-world educational data to build a robust classification model, integrated with explainability methods and a graphical user interface (GUI).

📊 Dataset

The dataset was sourced from the UCI Machine Learning Repository and includes:

  • 4,424 student records
  • 37 attributes per student (demographic, socioeconomic, academic)
  • Target variable: Graduate, Enrolled, Dropout

🔍 Methodology

  • Preprocessing: Handling categorical and numerical features, standardization, and one-hot encoding.
  • Feature Engineering: Creation of informative features (e.g., pass rates, weighted grades, parental education score).
  • Handling Imbalance: SMOTE oversampling for minority classes.
  • Models Used: Random Forest, CatBoost, XGBoost, LightGBM, SVM, Gradient Boosting, Decision Tree.
  • Model Selection: Hyperparameter tuning via HalvingGridSearchCV with 5-fold cross-validation.
  • Evaluation: Macro F1-score, Balanced Accuracy, ROC AUC.

🧠 Explainability

We apply SHAP (SHapley Additive Explanations) to:

  • Understand the global importance of features across the dataset
  • Interpret individual predictions through waterfall plots
  • Compare model transparency and consistency

🖥️ GUI

A Python-based Graphical User Interface (built with Tkinter) allows users to:

  • Input student data manually
  • View predictions and class probabilities
  • Access local SHAP explanations for each prediction

📂 Project Structure

├── data/ # Raw and cleaned datasets
├── models/ # Saved trained models
├── shap_output/ # SHAP values
├── notebook/ # Jupyter notebooks
├── utils/ # Preprocessing and feature engineering modules
├── results/ # Model comparison results
├── app.py/ # GUI application
└── README.md

📄 Documentation

For a detailed explanation of the methodology, results, and model explainability, refer to the full project documentation:
➡️ Documentation_FABIANI.pdf

About

Predictive modeling of student academic outcomes using supervised learning and explainable AI - DMML project

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published