Skip to content
Draft
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 16 additions & 6 deletions src/schema/rules/directories.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,16 @@
# The special "root" specifier describes the root of the dataset and only defines subdirectories.
# No naming convention applies, and the requirement level and opacity would be superfluous.
#
study:
bids-study:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These top-level keys are DatasetType values. If you want them to be more mnemonic, you'll need some alternative way to match to DatasetType.

I would also recommend against hyphens in keys, as we have heretofore used valid identifiers, which makes traversing the schema using dot notation (rules.directories.study) possible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry this was not the right place to suggest changes to DatasetType values.
The idea was to introduce bids- prefix in the directory names as an easier way to infer DatasetType similar to sub- and ses- directories. But that would possibly only work for bids-raw and bids-derivative subdirs within a study layout. Not sure if that's ideal at this point, so will simply revert to original naming.

root:
subdirs:
- code
- docs
- derivatives
- bids-raw
- bids-derivatives
- logs
- sourcedata
- phenotype
code:
name: code
level: optional
Expand All @@ -32,8 +34,12 @@ study:
name: docs
level: optional
opaque: true
derivatives:
name: derivatives
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I oppose this change on the grounds that derivatives/ already has a defined meaning, and a different name would imply a different meaning. For example bids-derivatives/ would probably be expected to contain only BIDS-Derivatives datasets, and no nonstandard derivatives (like freesurfer/). If you're not changing the meaning, don't change the name.

Further, using the same directory names as other BIDS dataset types allows for code that recursively indexes datasets to care very little about the specific DatasetType of the current dataset. The proposal to add a raw directory would be an addition, but I think it's sensible to be able to crawl sourcedata/ and derivatives/ the same as any other dataset.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point which I mostly agree. Although, I think what bids prefix would imply is a bit unclear.
Currently since the spec allows nonstandard pipeline output inside a valid BIDS dataset, I didn't think bids-derivatives DatasetType would imply change in that policy / meaning.
But based on earlier point, the proposal of bids suffix doesn't seem ideal here, so will revert to derivatives

bids-raw:
name: bids-raw
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here name would contain the actual intended directory name, while the key is just a value that shows up in subdirs. You could do something like:

Suggested change
bids-raw:
name: bids-raw
bids_raw:
name: bids-raw

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will drop the bids prefix idea and go back to calling it rawdata and rely on docs to explain that this is a raw bids Datatype clarify any confusion with the sourcedata. Does this work?

level: optional
opaque: true
bids-derivatives:
name: bids-derivatives
level: optional
opaque: true
logs:
Expand All @@ -44,8 +50,12 @@ study:
name: sourcedata
level: optional
opaque: true
phenotype:
name: phenotype
level: optional
opaque: false
Comment on lines +53 to +56
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was removed after this discussion: #2185 (comment)

I didn't see a justification for re-adding it, so I want to make sure this isn't slipped in with little consideration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should have explained this addition based on offline conversations.
The reason for removing phenotype directory was that it would presumably appear in the raw dataset. But isn't that true for derivatives directory as well? It's a bit confusing having derivatives part of all three DatasetTypes but enforcing phenotype only be part of raw or derivative. In practice, phenotype data curation is likely to be more independent than that of derivatives, which is one of the reasons we prefer it to be at the top level.


raw:
bids-raw:
root:
subdirs:
- code
Expand Down Expand Up @@ -103,7 +113,7 @@ raw:
level: required
opaque: false

derivative:
bids-derivative:
root:
subdirs:
- code
Expand Down
Loading