Skip to content

Ambiguities in specification RE Inheritance Principle #2156

@Lestropie

Description

@Lestropie

Related to #2155.

This list is specifically aspects of the specification where it remains unclear to me whether there is a fundamental ambiguity in the specification text. Some of these arise from a discrepancy between my expected outcomes of validation of new testing datasets (bids-standard/bids-examples#504) and the actual outcomes of validation, but I cannot definitive attribute fault to the validator and am therefore posting in the main specification repo.

1. Validity of filenames based on parent directory

In version 1.1.2, the Principle states:

Files for a particular participant can exist only at participant level directory, i.e /dataset/sub-*[/ses-*]/sub-*_T1w.json.
Similarly, any file that is not specific to a participant is to be declared only at top level of dataset for eg: task-sist_bold.json must be placed under /dataset/task-sist_bold.json.

This I find slightly ambiguous as it fails to properly explore the influence of the "ses" entity and what is / is not classified as "participant level directory".

In #946 for 1.7.0, the approximately corresponding rule is:

A metadata file MUST NOT have a filename that would be otherwise applicable to some data file based on rules 2.b and 2.c but is made inapplicable based on its location in the directory structure as per rule 2.a.

While this is harder to convey, it was intended to better encapsulate why such filesystem hierarchy placements are problematic. But from a validation perspective these are two different rules. I have datasets where I can take the same dataset but just change "BIDSVersion" from 1.1.2 to 1.7.0 and IMO whether or not the dataset should pass validation changes (even if the validator outcomes don't currently reflect that; but that's a discussion for the validator repository).

I do think that the 1.7.0 is more extensible than the 1.1.2, especially if we were to adopt eg. BIDS 2.0 #54. But it's more expensive to validate since computationally it scales quadratically. Open to discussion on whether the community would prefer that BIDS 1.0 be more explicit about separation of subject-specific vs. subject-agnostic files.

2. Exclusive non-sidecar pairings

There is an unusual data file - metadata file relationship I discovered that may not behave as expected. Imagine a dataset with just one data file, and one metadata file. The metadata file applies to the data file based on containing a subset of its entities. But it contains a strict subset; that is, it is not a "sidecar" file that contains exactly the same stem and differs only in file extension. See example dataset "ipexclnonsc".

Unless I've missed something in the specification, there is nothing explicitly precluding this. There's a good argument to be made that it should result in a warning, as there's no strong justification for such a structure. Currently both validators deem this a violation.

3. Unused metadata files

Because of the prospect of the Inheritance Principle, it is not possible to deem a dataset non-conformant just because a key-value JSON file is found that does not have a corresponding data file that differs only in file extension. It would however be "problematic" to some degree if such a file exists that is not inherited by any data file through the IP. So this IMO should be at least a warning, probably more suitably a violation. Currently, both validators deem this a violation.

Where I am however stuck is finding a relevant statement in the specification that currently defines this as being a violation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions