-
Notifications
You must be signed in to change notification settings - Fork 188
[ENH][BEP028] Specification update for BEP028 BIDS-Prov #2099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Boris,
Thanks a lot for this new version of the BIDS-Prov spec.
As discussed by email I reviewed the "key concepts" section as well as the outline.
Overall, it looks great and I think the outline gives a good flow of information.
One suggestion would be :
Overview
---- Goals
---- General principles
---- Key concepts
Provenance files
---- Activities
---- Entities
---- Software
---- Environments
Provenance of a BIDS file
---- Sidecar json
---- Provenance files
-------- Activities
-------- Software
-------- Environments
-------- Entities
Provenance of a BIDS dataset
---- Description using provenance records
---- Description of processes or pipelines
Consistency and uniqueness of identifiers
---- Identifiers for entities
---- Identifiers for other provenance records
Minimal examples
---- Provenance of a BIDS raw dataset
---- Provenance of a BIDS study dataset
And the section "Provenance of a BIDS dataset" would refer back to the subection "Provenance files" as needed.
|
||
Provenance records are described as JSON objects in BIDS. They are stored inside **provenance files** (see [Provenance files](#provenance-files)). | ||
|
||
Additionally, **provenance metadata** of entities can be stored as regular BIDS metadata inside: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, **provenance metadata** of entities can be stored as regular BIDS metadata inside: | |
Additionally, **provenance metadata** of entities can be stored as regular BIDS metadata inside sidecar JSON files (see [Provenance of a BIDS file](#provenance-of-a-bids-file)). |
|
||
Additionally, **provenance metadata** of entities can be stored as regular BIDS metadata inside: | ||
|
||
- sidecar JSON files (see [Provenance of a BIDS file](#provenance-of-a-bids-file)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- sidecar JSON files (see [Provenance of a BIDS file](#provenance-of-a-bids-file)); |
Additionally, **provenance metadata** of entities can be stored as regular BIDS metadata inside: | ||
|
||
- sidecar JSON files (see [Provenance of a BIDS file](#provenance-of-a-bids-file)); | ||
- `dataset_description.json` files (see [Provenance of a BIDS dataset](#provenance-of-a-bids-dataset)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `dataset_description.json` files (see [Provenance of a BIDS dataset](#provenance-of-a-bids-dataset)). | |
Finally, activities responsible for the creation of the dataset can be stored in `dataset_description.json` files (see [Provenance of a BIDS dataset](#provenance-of-a-bids-dataset)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Activites are not stored in dataset_description.json
.
Is your suggestion only related to the writing as a list or is it a matter of meaning ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But a link to an Activity is stored in dataset_description.json
? If yes then the text above can be amended as follows: "Finally, activities responsible for the creation of the dataset can be linked from "
Here are the outstanding issues I can see, given the current state of the pull requests:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another set of proposed updates as per our discussions
|
||
### Provenance of a BIDS raw dataset | ||
|
||
Consider the following BIDS raw dataset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ease reading of this section, it would be nice to add some textual description of what is found in the dataset, something along the lines of "following BIDS raw dataset that contains a single T1-weighted image that was generated from a set of DICOM files:"
|
||
### Provenance of a BIDS derivative dataset | ||
|
||
Consider the following BIDS derivative dataset: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, a few words describing the dataset would be great.
I had a chance to talk with @rwblair about the arbitrary subdirectories yesterday. At least from the examples given (
Although a bit different from how BIDS has done things (data type before first entity directory), the machinery we already have in the schema is sufficient to encode this and the changes to the validator should not be difficult. |
@effigies , thanks for the input on the arbitrary subdirectories ! We'll discuss that with Yarik and Camille tomorrow. |
FWIW, I like it since it is generic. Might also be applicable to e.g. BEP044:Stimuli where ATM stimuli files are not groupped but have But overall it comes to the question when is worth keeping flat vs creating those folders, and it is kinda a generic aspect: e.g. if there is a BIDS dataset with only T1w images for 10 subjects -- folders are not really making it easier to navigate the data. Somewhat of a usecase for Similarly here: if it is just a single "stage" derivative dataset (e.g. bids-app applied in one go across all subjects) -- there is no need for subfolders there, right? |
I have no objections to keeping it flat. I was under the impression that nesting was important. |
@effigies I see now that you proposed to make Per our discussion I also would recommend establishing I lean toward always requiring them for the sake of consistency. |
We don't, but it is a convention, not a technical problem.
I defer to the BEP leads to make the proposal they want. |
This is a work in progress PR proposing a specification update for BEP028 BIDS-Prov.
- [ ] being proofread
- [ ] validator error :
/prov/*
NOT_INCLUDED- [ ] validator error :
/prov/*.json
SIDECAR_WITHOUT_DATAFILE- [ ] validator error : derivative files are listed as NOT_INCLUDED / ALL_FILENAME_RULES_HAVE_ISSUES /FILENAME_MISMATCH / ENTITY_WITH_NO_LABEL