From Model to Production: Dual AWS Deployment (Elastic Beanstalk + Docker/ECR/EC2 CI/CD)

End to End Machine Learning Project

From Model to Production: Dual AWS Deployment (Elastic Beanstalk + Docker/ECR/EC2 CI/CD)

Student Math Score Predictor

This repository is the Elastic Beanstalk deployment track on AWS for the Student Math Score Predictor project.
The Docker + ECR + EC2 CI/CD track is maintained in a separate companion repository.

Repository Map

Elastic Beanstalk deployment (this repo): https://github.com/Affan005-ai/ML-Projects
Docker + ECR + EC2 CI/CD repo: https://github.com/Affan005-ai/AWS-CI-CD-Projects

Problem Statement

Predict a student’s math score from:

Gender
Race/Ethnicity
Parental level of education
Lunch type
Test preparation course
Reading score
Writing score

Project Features

End-to-end ML pipeline from ingestion to artifact generation
Flask web app for live prediction
Multi-model regression training
Hyperparameter tuning with RandomizedSearchCV
AWS Elastic Beanstalk deployment setup
Real commit-driven iteration and fixes

End-to-End Architecture

Data ingestion reads Notebook/data/StudentsPerformance.csv
Train-test split saved to artifacts/train.csv and artifacts/test.csv
Transformation pipeline handles imputation, encoding, and scaling
Multiple regressors are trained and tuned
Best model + preprocessor saved in artifacts/
Flask app serves inference via /predict
Elastic Beanstalk hosts the application

ML Pipeline (Deep Dive)

1) Data Ingestion

Implemented in src/components/data_ingestion.py:

Reads source dataset
Saves raw/train/test CSV artifacts
Uses deterministic split with random_state=42

2) Data Transformation

Implemented in src/components/data_transformation.py:

Numerical features: impute median + standard scaling
Ordinal features: impute most frequent + ordered encoding
Nominal features: impute most frequent + one-hot encoding

3) Model Training and Selection

Implemented in src/components/model_trainer.py:

Trains multiple regressors
Compares test R2 scores
Selects best model above quality threshold

4) Hyperparameter Tuning

Implemented in src/utils.py via evaluate_models(...):

Uses RandomizedSearchCV
n_iter=9, cv=3, scoring='r2', n_jobs=-1
Picks best estimator per model
Saves best overall model artifact

Inference Layer

application.py exposes:

/ for UI
/predict for form-based prediction

Artifacts loaded at runtime:

artifacts/model_1.pkl
artifacts/preprocessor_1.pkl

Deployment Track (Primary in This Repo): Elastic Beanstalk

Key files:

.elasticbeanstalk/config.yml
.ebextensions/python.config
Procfile
application.py

Why this track:

Managed platform lifecycle
Faster app hosting with less infra overhead

Deployment Track (Related Repo): Docker + ECR + EC2 + GitHub Actions

Companion repo: https://github.com/Affan005-ai/AWS-CI-CD-Projects

What that track covers:

Docker image build
Push to Amazon ECR
Deploy on EC2 self-hosted runner
CI/CD automation in GitHub Actions

Production-Grade Improvements

Use Gunicorn-based runtime consistently
Add CI quality gates (real tests/linting)
Add image vulnerability scanning
Use OIDC/IAM role instead of long-lived keys
Add CloudWatch logging and alarms
Add health checks and rollback strategy

One Screenshot Proof

Quick Start

pip install -r requirements.txt
python application.py
Open:

http://127.0.0.1:5000

Credits

This project and deployment learning journey were inspired by Krish Naik and his end-to-end ML engineering guidance.

Acknowledgment

Thanks to the open-source Python, scikit-learn, Flask, Docker, and AWS communities for documentation and tooling support.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.ebextensions		.ebextensions
Notebook/data		Notebook/data
artifacts		artifacts
src		src
static		static
templates		templates
.gitignore		.gitignore
Procfile		Procfile
README.md		README.md
application.py		application.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End to End Machine Learning Project

From Model to Production: Dual AWS Deployment (Elastic Beanstalk + Docker/ECR/EC2 CI/CD)

Student Math Score Predictor

Repository Map

Problem Statement

Project Features

End-to-End Architecture

ML Pipeline (Deep Dive)

1) Data Ingestion

2) Data Transformation

3) Model Training and Selection

4) Hyperparameter Tuning

Inference Layer

Deployment Track (Primary in This Repo): Elastic Beanstalk

Deployment Track (Related Repo): Docker + ECR + EC2 + GitHub Actions

Production-Grade Improvements

One Screenshot Proof

Quick Start

Credits

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

End to End Machine Learning Project

From Model to Production: Dual AWS Deployment (Elastic Beanstalk + Docker/ECR/EC2 CI/CD)

Student Math Score Predictor

Repository Map

Problem Statement

Project Features

End-to-End Architecture

ML Pipeline (Deep Dive)

1) Data Ingestion

2) Data Transformation

3) Model Training and Selection

4) Hyperparameter Tuning

Inference Layer

Deployment Track (Primary in This Repo): Elastic Beanstalk

Deployment Track (Related Repo): Docker + ECR + EC2 + GitHub Actions

Production-Grade Improvements

One Screenshot Proof

Quick Start

Credits

Acknowledgment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages