This repository is the Elastic Beanstalk deployment track on AWS for the Student Math Score Predictor project.
The Docker + ECR + EC2 CI/CD track is maintained in a separate companion repository.
- Elastic Beanstalk deployment (this repo):
https://github.com/Affan005-ai/ML-Projects - Docker + ECR + EC2 CI/CD repo:
https://github.com/Affan005-ai/AWS-CI-CD-Projects
Predict a student’s math score from:
- Gender
- Race/Ethnicity
- Parental level of education
- Lunch type
- Test preparation course
- Reading score
- Writing score
- End-to-end ML pipeline from ingestion to artifact generation
- Flask web app for live prediction
- Multi-model regression training
- Hyperparameter tuning with
RandomizedSearchCV - AWS Elastic Beanstalk deployment setup
- Real commit-driven iteration and fixes
- Data ingestion reads
Notebook/data/StudentsPerformance.csv - Train-test split saved to
artifacts/train.csvandartifacts/test.csv - Transformation pipeline handles imputation, encoding, and scaling
- Multiple regressors are trained and tuned
- Best model + preprocessor saved in
artifacts/ - Flask app serves inference via
/predict - Elastic Beanstalk hosts the application
Implemented in src/components/data_ingestion.py:
- Reads source dataset
- Saves raw/train/test CSV artifacts
- Uses deterministic split with
random_state=42
Implemented in src/components/data_transformation.py:
- Numerical features: impute median + standard scaling
- Ordinal features: impute most frequent + ordered encoding
- Nominal features: impute most frequent + one-hot encoding
Implemented in src/components/model_trainer.py:
- Trains multiple regressors
- Compares test R2 scores
- Selects best model above quality threshold
Implemented in src/utils.py via evaluate_models(...):
- Uses
RandomizedSearchCV n_iter=9,cv=3,scoring='r2',n_jobs=-1- Picks best estimator per model
- Saves best overall model artifact
application.py exposes:
/for UI/predictfor form-based prediction
Artifacts loaded at runtime:
artifacts/model_1.pklartifacts/preprocessor_1.pkl
Key files:
.elasticbeanstalk/config.yml.ebextensions/python.configProcfileapplication.py
Why this track:
- Managed platform lifecycle
- Faster app hosting with less infra overhead
Companion repo: https://github.com/Affan005-ai/AWS-CI-CD-Projects
What that track covers:
- Docker image build
- Push to Amazon ECR
- Deploy on EC2 self-hosted runner
- CI/CD automation in GitHub Actions
- Use Gunicorn-based runtime consistently
- Add CI quality gates (real tests/linting)
- Add image vulnerability scanning
- Use OIDC/IAM role instead of long-lived keys
- Add CloudWatch logging and alarms
- Add health checks and rollback strategy
pip install -r requirements.txt
python application.py
Open:
http://127.0.0.1:5000- This project and deployment learning journey were inspired by Krish Naik and his end-to-end ML engineering guidance.
- Thanks to the open-source Python, scikit-learn, Flask, Docker, and AWS communities for documentation and tooling support.
