This machine learning project analyzes and predicts student performance based on various demographic and academic factors. The model aims to predict math scores using features like gender, race/ethnicity, parental education, lunch type, and test preparation course completion status. The project includes a web application for making predictions.
- Installation
- Dataset
- Project Structure
- Usage
- Model Architecture
- Web Application
- Results
- Contributing
- License
bash git clone https://github.com/ArihantSingla21/student-performance-prediction.git cd student-performance-prediction
python -m venv venv
source venv/bin/activate
venv\Scripts\activate
pip install -r requirements.txt
The dataset contains student performance information:
- Size: 1000 students, 8 features
- Features:
- Gender
- Race/ethnicity
- Parental level of education
- Lunch type
- Test preparation course
- Reading score
- Writing score
- Math score (target variable)
- Preprocessing:
- Categorical features encoded using OneHotEncoder
- Numerical features scaled using StandardScaler
- Data split into training (80%) and testing (20%) sets
notebook/EDA.ipynb: Exploratory Data AnalysisMODEL TRAINING.ipynb: Model Training and Evaluation
data/stud.csv: Raw dataset
src/components/: Contains data ingestion, transformation, and model training modulespipeline/: Contains prediction pipeline for web applicationutils.py: Utility functions for saving models and evaluating performanceexception.py: Custom exception handlinglogger.py: Logging configuration
templates/home.html: Main prediction form interfaceindex.html: Landing page
artifacts/: Stores trained models and preprocessed dataapplication.py: Web application entry point.ebextensions/: AWS Elastic Beanstalk configuration
The project includes a web interface built with Flask that allows users to:
- Input student information through a user-friendly form
- Get instant predictions for math scores
- Access the application through a deployed web service
To run the application locally:
python application.pyThe application can be accessed at http://localhost:5000
Multiple regression models were implemented and compared:
- Linear Regression
- Lasso Regression
- Ridge Regression
- Decision Tree Regressor
- Random Forest Regressor
Key findings from EDA:
- Students who completed the test preparation course showed higher average scores
- Lunch type and gender showed correlations with performance
- Clear patterns in score distributions across different demographic groups
Model performance metrics include:
- Mean Absolute Error (MAE)
- Root Mean Square Error (RMSE)
- R-squared (R2) score
Detailed performance metrics for each model are available in the training notebook.
- Fork the repository
- Create a new branch (
git checkout -b feature/improvement) - Make your changes
- Commit your changes (
git commit -am 'Add new feature') - Push to the branch (
git push origin feature/improvement) - Create a Pull Request
This project is licensed under the MIT License.
Created by ArihantSingla21
For questions or support, please open an issue or contact [arihantsingla21@gmail.com]