Sample repository for using as a blue-print in ML experimentation projects, which is using DVC for versioning and MLFlow for tracking.
- initialize DVC in the repository ->
dvc init - install DVC
post-checkout,pre-commit,pre-push->dvc install - list the tracked data in the data registry ->
dvc list https://github.com/iamsoroush/dvc-minio-data-registry --rev main - import the datasource(s) ->
dvc import https://github.com/iamsoroush/dvc-minio-data-registry.git "datasources/RSNA" -o data/, or update the dataset to the latest version ->dvc update datasources/pacs.dvc - import the task meta-data ->
dvc import https://github.com/iamsoroush/dvc-minio-data-registry.git "tasks/hemo" -o data/hemo - add and commit dvc files in the data folder ->
git add data/.gitignore data/*.dvc; git commit data/.gitignore data/*.dvc -m "add RSNA datasource and hemo meta-data"
- prepare the data:
python prepare.py --meta-data data/hemo/meta-data.csv --output-dir data/hemo - train:
python train.py --conf config.yaml - evaluate:
python evaluate.py --conf config.yaml - export:
python export.py --conf config.yaml