diff --git a/mkdocs.yml b/mkdocs.yml index 28ba5d5b5a..c7d4f36f81 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -46,6 +46,7 @@ nav: - Quantitative MRI: appendices/qmri.md - Arterial Spin Labeling: appendices/arterial-spin-labeling.md - Cross modality correspondence: appendices/cross-modality-correspondence.md + - Phenotypic data guidelines: appendices/phenotype.md - Changelog: CHANGES.md - The BIDS Starter Kit: - Website: https://bids.neuroimaging.io/getting_started/ diff --git a/src/appendices/phenotype.md b/src/appendices/phenotype.md new file mode 100644 index 0000000000..e67a169060 --- /dev/null +++ b/src/appendices/phenotype.md @@ -0,0 +1,424 @@ +# Tabular phenotypic data guidelines + +This appendix is a collection of guidelines and examples +for curating well-organized tabular phenotypic data. + +## Guidelines + +These guidelines are intended to improve the organization and clarity of +tabular phenotypic data like the participants file, sessions file, +and phenotypic and assessment data. + +They are recommendations and are by default ignored during validation. +You can make them mandatory during validation by setting the +[`AdditionalValidation` key](../modality-agnostic-files/dataset-description.md#additional-validation) +to `"Phenotype"` in the `dataset_description.json`. + +### 1. Aggregate data across sessions + +Aggregate participant information across all sessions into one tabular TSV file per +measurement or phenotypic assessment and store this file in the `/phenotype` directory. +Demographic information is a special case and MUST be aggregated +in the `participants.tsv` file at the root level of the dataset. +It is RECOMMENDED to use the `age` column in the `participants.tsv` file +to record participant age at every session in longitudinal or multi-session data sets. + +### 2. Always pair tabular data with data dictionaries + +Tabular phenotypic data MUST be prepared as one pair of a tabular file +in tab-separated value (TSV) format and a corresponding data dictionary +in JavaScript Object Notation (JSON) format. +See the [Tabular files section](../common-principles.md#tabular-files) for more information. + +### 3. Add `MeasurementToolMetadata` to each tabular phenotypic measurement tool + +Whenever possible, it is RECOMMENDED to add `MeasurementToolMetadata` to +each `phenotype/.json` data dictionary. +This improves reusability and provides clarity about the measurement tool. +See [`MeasurementToolMetadata` in the glossary](../glossary.md#measurementtoolmetadata-metadata) for more. + +### 4. Ensure minimal annotation for phenotypic and assessment data + +In phenotypic and assessment data each measurement tool SHOULD have an independent +aggregated data TSV file in which the user collects all subjects, sessions, +and/or runs of data as one entry per row (with a row defined by +the smallest unit of acquisition). In other words: + +- Each row MUST start with `participant_id`. + +- Each TSV file MUST contain a `session_id` column when + multiple [sessions](../glossary.md#session-entities)[1](#footnotes) are present + in the data set regardless of whether those sessions are in + the `phenotype/` data, `sub-