Skip to content
This repository was archived by the owner on Dec 31, 2025. It is now read-only.

Commit 3532f9a

Browse files
committed
Merge branch 'text/datamgmt-Nov3' of github.com:poldrack/BetterCodeBetterScience into text/datamgmt-Nov3
2 parents cea4317 + e304862 commit 3532f9a

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

book/data_management.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -974,7 +974,7 @@ create(ok): my_datalad_repo (dataset)
974974

975975
```
976976

977-
This creates a new directory, called `my_datalad_repo` and sets it up as a DataLad subdataset within our main git repo. We then download some data files from another project using the `datalad download-url` function, which will both download the data and save them into the datalad dataset:
977+
This creates a new directory, called `my_datalad_repo` and sets it up as a DataLad dataset. We then go into the directory and create a subdirectory called `data`, and then download some data files from another project. We do this using the `datalad download-url` function, which will both download the data and save them to the datalad dataset:
978978

979979
```bash
980980
➤ datalad download-url -d . -O my_datalad_repo/data/ https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/demographics.csv
@@ -1023,7 +1023,7 @@ Date: Mon Dec 15 13:40:29 2025 -0800
10231023
https://raw.githubusercontent.com/IanEisenberg/Self_Regulation_Ontology/refs/heads/master/Data/Complete_02-16-2019/demographics.csv
10241024
```
10251025

1026-
Here we see the commit messages that were automatically created by DataLad for downloading the URLS. The `datalad download-url` function adds the URL to the log, which is useful for provenance tracking. If one wishes to download a large number of files, there is also a `datalad addurls` command that can download multiple files based on a single text file containing the relevant URLs and information.
1026+
Here we see the commit messages that were automatically created by DataLad, first for creating the new dataset and then for downloading the URLS. The `datalad download-url` function adds the URL to the log, which is useful for provenance tracking.
10271027

10281028
#### Modifying files
10291029

@@ -1191,6 +1191,7 @@ action summary:
11911191
One can also push data using DataLad to a range of other remote hosts; see the [DataLad documentation](https://handbook.datalad.org/en/latest/basics/101-138-sharethirdparty.html) for more on this.
11921192
11931193
1194+
11941195
## Archiving data
11951196
11961197
At the end of a project the data may seem like they are no longer needed, but in many cases there are reasons to retain the data beyond the end of the project. Funding agencies often have a required data retention period beyond the end of the grant. For example, the US National Institutes of Health (NIH) requires that records be retained for [three years](https://grants.nih.gov/grants/policy/nihgps/HTML5/section_8/8.4.2_record_retention_and_access.htm) beyond the end of the funding. Some universities also have their own data retention requirements; for example, my institution (Stanford University) also has a [three-year data retention requirement](https://doresearch.stanford.edu/policies/research-policy-handbook/conduct-research/retention-and-access-research-data), whereas Johns Hopkins University has a [five-year retention requirement](https://www.hopkinsmedicine.org/institutional-review-board/guidelines-policies/guidelines/record-retention). In my opinion it is preferable to retain data, at least in archival form, as long as possible. I have received requests to share data more than 15 years after the original study completion, and it was only due to long-term retention of these data that we were able to honor these requests.

0 commit comments

Comments
 (0)