This project demonstrates a production-ready MLOps pipeline for automated training, logging, and delivery of Machine Learning models using MLflow and GitHub Actions.
- Code Push: Developer pushes code to the
mainbranch. - CI/CD Pipeline:
- Training:
train.pyexecutes, training a RandomForestClassifier on the Iris dataset. - Tracking: Hyperparameters, metrics (accuracy), and tags are logged via MLflow Tracking.
- Artifacts: Confusion matrices and classification reports are saved as experiment artifacts.
- Registration: The model is automatically registered in the MLflow Model Registry as
IrisClassifierFinal.
- Training:
- Automated Testing:
pytestvalidates the registered model's performance and data consistency. - Deployment (Staging): If tests pass, the model is prepared for transition to the
Stagingstage.
- Languages: Python 3.9+
- ML Engine: Scikit-Learn
- Operations: MLflow (Tracking & Registry)
- CI/CD: GitHub Actions
- Testing: Pytest
- Data: Pandas, NumPy
mlops_challenge/
├── .github/workflows/ml_pipeline.yml # CI/CD Pipeline definition
├── tests/
│ └── test_model.py # Validation tests for registry models
├── train.py # Training & Registration script
├── requirements.txt # Dependency list
├── README.md # Documentation
└── .gitignore # Git exclusions
pip install -r requirements.txtIn a separate terminal:
mlflow uipython train.pyEnsure you provide the correct Model URI from the MLflow UI:
$env:MODEL_URI="models:/IrisClassifierFinal/latest"; python -m pytest tests/test_model.pyWe use mlflow.set_tag() to attach Git commit hashes and user metadata to every run, ensuring 100% reproducibility and auditing capabilities in production.