A starter template for machine learning projects as part of my newsletter series: https://www.sarahglasmacher.com/ml-repo-structure-challenge/
- Project structure for ML workflows using a Python package for easier deployment later on
- Example scripts for data processing, training, and evaluation
- Using Pydantic for safe and verified configuration .yamls
- Pyproject.toml file for dependency and project management
For the repository to make it your own, then:
-
Clone the repository:
git clone https://github.com/yourusername/ml-blueprint.git cd ml-blueprint -
Install dependencies:
uv sync
Uses example data from the following Kaggle dataset for demonstration purposes only: https://www.kaggle.com/competitions/playground-series-s4e12/data
To use the repo, add the data yourself in the following structure:
ml-blueprint/
.
├── artifacts/
...
├── data # add here
│ ├── playground-series-s4e12 # add this repository by downloading from kaggle
│ │ ├── sample_submission.csv
│ │ ├── test.csv
│ │ └── train.csv
│ ├── test.csv # these will be generated by code
│ ├── train.csv
├── notebooks
│ └── one_stop_notebook.ipynb
├── scripts/
...
├── src/
│ └── <your_pkg>/
...
├── pyproject.toml
├── README.md
└── uv.lock