Skip to content

Conversation

@weiji14
Copy link
Owner

@weiji14 weiji14 commented Mar 12, 2021

To have fully reproducible data science pipelines, let's use dvc! The data will be stored and viewable at https://dagshub.com/weiji14/deepicedrain.

TODO:

  • Add dvc dependency to pyproject.toml (21d691c)
  • Setup dvc via dvc init and configure dvc remote (c7fc42a)
  • Push up some data to the dvc remote and do something about reproducing PNG figures? (TODO in separate PRs)

Crossreference GenericMappingTools/pygmt#1036 where there's some good step-by-step instructions and lots of references to external links.

Data Version Control | Git for Data & Models!
@weiji14 weiji14 added data 🗃️ Pull requests that update input datasets feature 🚀 Brand new feature labels Mar 12, 2021
@weiji14 weiji14 added this to the v0.5.0 milestone Mar 12, 2021
@weiji14 weiji14 self-assigned this Mar 12, 2021
Initialize with standard dvc files and folders. DVC remote configuration points to https://dagshub.com/weiji14/deepicedrain. The .dvc/.gitignore is just the out of the box one from `dvc init` and .dvcignore is just empty for now, waiting to be populated later. Also need to run the following locally to update the .dvc/config.local file:

```
dvc remote modify origin --local auth basic
dvc remote modify origin --local user "$DAGSHUB_USER"
dvc remote modify origin --local password "$DAGSHUB_PASS"  # generate token at https://dagshub.com/user/settings/tokens
```
@weiji14 weiji14 merged commit 85615e8 into main Mar 15, 2021
@weiji14 weiji14 deleted the init/data_version_control branch March 15, 2021 03:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data 🗃️ Pull requests that update input datasets feature 🚀 Brand new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant