This repository is part of my MLOps coursework in the Master of Science in Data Science program at the University of San Francisco. The goal of the class is to learn how to build reliable, reproducible, and production-ready machine learning workflows.
- Develop an end-to-end ML pipeline from data ingestion to model deployment
- Practice building modular code and workflows using MLOps tools
- Learn how to track data, parameters, and experiments
- Collaborate using Git and GitHub best practices
| Tool | Purpose |
|---|---|
| Python | Main programming language |
| DVC | Data & model version control |
| Git / GitHub | Code versioning and collaboration |
| VS Code | Dev environment |
| Pandas, scikit-learn | Data wrangling & preprocessing |
✅ Milestone 1: Data Preprocessing + DVC Integration
- Clean and encode data
- Track raw and processed datasets with DVC
- Reproducible preprocessing script with
dvc.yaml
Next steps:
- 🧠 Add model training stage
- 📊 Add evaluation + metrics logging
- 🚀 Set up model serving (Flask or FastAPI)