-
Notifications
You must be signed in to change notification settings - Fork 188
[ENH] Add "study" DatasetType to organize a collection of source and derivative datasets #1972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Add "study" DatasetType to organize a collection of source and derivative datasets #1972
Conversation
d0d5c37
to
fb4f5a4
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1972 +/- ##
=======================================
Coverage 82.71% 82.71%
=======================================
Files 20 20
Lines 1608 1608
=======================================
Hits 1330 1330
Misses 278 278 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
FWIW, we conversed with @effigies and he brought up an interesting argument, although IMHO not contradicting this one per se, is that ATM any BIDS dataset (raw or derivative) which already contains some subdatasets under edit: related linked below is #2103 highlighting the same situation with "raw" dataset containing "derivatives/" |
@effigies I wonder if we should extend |
I'm skeptical of that need. I would expect your
And
Subdatasets should be validatable BIDS datasets in their own right, avoiding the need for a top-level dataset_description.json to modify how they are intended to be validated. |
I think this overall needs more specification. What are valid directories in a I think a project dataset is barely worth specifying if we don't validate at least the raw data subdataset. Possibly we should have rules for indicating where validators should look for subdatasets. In OpenNeuroDerivatives, we use |
yet to "process" but a quick side idea inspired by #1928 --- I wonder if there is a hierarchy here: project (everything common) -> raw (current default, requires having sub- folder(s)) -> derivative (more stuff could be added), as every next level adds capabilities but includes all of the prior one as derivative could include raw in it? or we have already something which invalidates that? |
They are already there, that's somewhat the point here -- that we are already defining the structure of all those folders, nothing new to add.
I think presence of the subdatasets is not really the differentiation here, and formalization of rules for their validation is orthogonal to this issue. Having in mind my prior observation that "raw" is pretty much "a project with data in sub-* folders" we might be circling back to that issue of requiring Could even kinda become nice that we would facilitate people to even start their "raw BIDS datasets" as "project BIDS datasets" where they plan (README, code/ etc) until they start populate with data and thus becoming |
9c68bf8
to
6f236ce
Compare
@effigies Following your idea, I have now added a "warning" (to reflect level of the analogous SubjectFolders check in "raw" BIDS). I guess, in principle, we could take this as an opportunity to revert |
While discussing with @jbpoline we wondered, if may be we also mention "study" in various places in BIDS which seems to align nicely here❯ git grep study
src/CHANGES.md:- \[FIX] update physio bids name in longitudinal study page examples [#863](https://github.com/bids-standard/bids-specification/pull/863) ([Remi-Gau](https://github.com/Remi-Gau))
src/appendices/coordinate-systems.md:The following template identifiers are RECOMMENDED for individual- and study-specific reference
src/appendices/coordinate-systems.md:In the case of multiple study templates, additional names may need to be defined.
src/appendices/coordinate-systems.md:| study | Custom space defined using a group/study-specific template. This coordinate system requires specifying an additional file to be fully defined. |
src/appendices/hed.md:numerical values that are similar across the recordings in the study.
src/appendices/hed.md:repository on GitHub should be used to validate the study event annotations.
src/common-principles.md: unless when appropriate given the study goals, for example, when scanning babies.
src/introduction.md:> The data used in the study were organized using the
src/modality-specific-files/genetic-descriptor.md: "Dataset": "https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001364.v1.p1",
src/modality-specific-files/intracranial-electroencephalography.md:Note that the date and time information SHOULD be stored in the study key file
src/modality-specific-files/magnetic-resonance-spectroscopy.md:acquisition parameters in filenames is helpful or necessary to distinguish datasets in a given study.
src/modality-specific-files/motion.md:Note that the onsets of the recordings SHOULD be stored in the study key file [(`scans.tsv`)](../modality-agnostic-files.md#scans-file).
src/modality-specific-files/positron-emission-tomography.md:This entity is OPTIONAL if only one tracer is used in the study,
src/modality-specific-files/task-events.md:Please mind that this does not imply that only so called "event related" study designs
src/schema/objects/common_principles.yaml: A set of neuroimaging and behavioral data acquired for a purpose of a particular study.
src/schema/objects/common_principles.yaml: Session can (but doesn't have to) be synonymous to a visit in a longitudinal study.
src/schema/objects/common_principles.yaml: A person or animal participating in the study.
src/schema/objects/entities.yaml: For example, this should be used when a study includes two T1w images -
src/schema/objects/entities.yaml: Session can (but doesn't have to) be synonymous to a visit in a longitudinal study.
src/schema/objects/entities.yaml: A person or animal participating in the study.
src/schema/objects/enums.yaml:study:
src/schema/objects/enums.yaml: value: study
src/schema/objects/enums.yaml: display_name: study
src/schema/objects/enums.yaml: Custom space defined using a group/study-specific template.
src/schema/objects/metadata.yaml: Reference to the study/studies on which the implementation is based.
src/schema/objects/metadata.yaml: The version of the HED schema used to validate HED tags for study.
tools/schemacode/src/bidsschematools/tests/data/broken_dataset_description.json:"EthicsApprovals": ["The original study from which this BIDS example dataset was derived was approved by the Ethics committee of Ghent University Hospital with identifier EC 2017/1103."]
and "project" mentionings are not particularly aligned. So, I think, we should just make it a "study", hence renaming accordingly. |
This reverts commit a3c12f8 where I have tried to introduce it in bids-standard#1741 but it required a little more of further detailing.
Idea from @effigies while discussing this PR at BIDS Maintainers meeting 2025
…ith SubjectFolders check Also adjusted wording to be aligned too
While discussing with @jbpoline we wondered, if may be `study` would be a better descriptor to use here in favor of `project`. One of the rationales, is that e.g. in [BEP035](https://bids.neuroimaging.io/extensions/beps/bep_035.html) (attn @bids-standard/bep035) on Mega-analysis they introduce `study-` entity as a groupping element. It kinda then would match natively. we also mention "study" in various places in BIDS which seems to align nicely here ```shell ❯ git grep study src/CHANGES.md:- \[FIX] update physio bids name in longitudinal study page examples [bids-standard#863](bids-standard#863) ([Remi-Gau](https://github.com/Remi-Gau)) src/appendices/coordinate-systems.md:The following template identifiers are RECOMMENDED for individual- and study-specific reference src/appendices/coordinate-systems.md:In the case of multiple study templates, additional names may need to be defined. src/appendices/coordinate-systems.md:| study | Custom space defined using a group/study-specific template. This coordinate system requires specifying an additional file to be fully defined. | src/appendices/hed.md:numerical values that are similar across the recordings in the study. src/appendices/hed.md:repository on GitHub should be used to validate the study event annotations. src/common-principles.md: unless when appropriate given the study goals, for example, when scanning babies. src/introduction.md:> The data used in the study were organized using the src/modality-specific-files/genetic-descriptor.md: "Dataset": "https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001364.v1.p1", src/modality-specific-files/intracranial-electroencephalography.md:Note that the date and time information SHOULD be stored in the study key file src/modality-specific-files/magnetic-resonance-spectroscopy.md:acquisition parameters in filenames is helpful or necessary to distinguish datasets in a given study. src/modality-specific-files/motion.md:Note that the onsets of the recordings SHOULD be stored in the study key file [(`scans.tsv`)](../modality-agnostic-files.md#scans-file). src/modality-specific-files/positron-emission-tomography.md:This entity is OPTIONAL if only one tracer is used in the study, src/modality-specific-files/task-events.md:Please mind that this does not imply that only so called "event related" study designs src/schema/objects/common_principles.yaml: A set of neuroimaging and behavioral data acquired for a purpose of a particular study. src/schema/objects/common_principles.yaml: Session can (but doesn't have to) be synonymous to a visit in a longitudinal study. src/schema/objects/common_principles.yaml: A person or animal participating in the study. src/schema/objects/entities.yaml: For example, this should be used when a study includes two T1w images - src/schema/objects/entities.yaml: Session can (but doesn't have to) be synonymous to a visit in a longitudinal study. src/schema/objects/entities.yaml: A person or animal participating in the study. src/schema/objects/enums.yaml:study: src/schema/objects/enums.yaml: value: study src/schema/objects/enums.yaml: display_name: study src/schema/objects/enums.yaml: Custom space defined using a group/study-specific template. src/schema/objects/metadata.yaml: Reference to the study/studies on which the implementation is based. src/schema/objects/metadata.yaml: The version of the HED schema used to validate HED tags for study. tools/schemacode/src/bidsschematools/tests/data/broken_dataset_description.json:"EthicsApprovals": ["The original study from which this BIDS example dataset was derived was approved by the Ethics committee of Ghent University Hospital with identifier EC 2017/1103."] ``` and "project" mentionings are not particularly aligned. So, I think, we should just make it a "study", hence renaming accordingly.
154a8cc
to
315c08f
Compare
sorry, I do not see how this relates to this PR since having a "valid BIDS datasets in sourcedata/" seems to point to be a "derivative BIDS" dataset by its definition. edit: the point is that "study" dataset could even be empty to start with, start collecting various other |
@effigies who, among maintainers, do you think might also be interested to review this PR? |
But what are you validating if there are no valid subdatasets? Why are you running the validator? |
dang... I guess I mixed up this PR with something else since I thought that we have merged it!
I did some edit above, may be it came during your question:?
so the point is that you would validate what is available -- that there is no to get some examples. |
@effigies could you invite or nominate a few other PR reviewers who might potentially be interested? ;-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yarikoptic. A study
BIDS dataset would be useful for the LINC project where we are disseminating source, raw, and derived data together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @yarikoptic.
Co-authored-by: Kabilar Gunalan <[email protected]>
6e3b18b
to
02a8074
Compare
Hi @yarikoptic @effigies, I am new to the BIDS release process. What are the next steps to get this merged and released as part of the latest BIDS spec? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good to me for a start!
2 maintainers + 1 (associated) contributor approvals. How many more to request/wait for ? ;) |
One trivial click for a maintainer, one giant leap for the BIDSkind! Thank you @julia-pfarr ! ;-) |
…where was missing Apparently I have missed this file entirely whenever I was preparing bids-standard#1972 (study DatasetType) and also due to all the duplication we (I) missed that docs was not listed among "root.subdirs" or not listed at all. This would provide a fix, but I wonder if we could/should make it so we could avoid duplication altogether. As I have argued in bids-standard#1972 I feel that "study" is the base dataset type and next ones just add potentially more to them. So may be we could come up with some more compact representation here... but not in this PR
…d "docs" where was missing (#2185) * Add "directories" description for "study" DatasetType and add "docs" where was missing Apparently I have missed this file entirely whenever I was preparing #1972 (study DatasetType) and also due to all the duplication we (I) missed that docs was not listed among "root.subdirs" or not listed at all. This would provide a fix, but I wonder if we could/should make it so we could avoid duplication altogether. As I have argued in #1972 I feel that "study" is the base dataset type and next ones just add potentially more to them. So may be we could come up with some more compact representation here... but not in this PR * Remove "phentoype/" and "stimuli/" from the "root" of the "study" dataset Since per se they should either be under "sourcedata/" or "derivatives/" one way or another even if it is some stimuli-only dataset(s)
edit: formerly it was "project" but then renamed to "study" for better alignment
This PR was initially submitted as #1861 but I made a mistake to combine it with a discussion of transformations of existing projects' layouts into such BIDS project dataset. Please refer to that PR for examples but otherwise let's concentrate here on the discussion of this specific proposed change.
dataset_description.json
.TODOs:
When accepted:
docs/getting_started/folders_and_files/derivatives.md
of website. See Remove example ofrawdata/
at the top-level of a BIDS dataset bids-website#687