[ENH] BEP036 - Phenotypic Data Guidelines #2123

ericearl · 2025-05-30T14:09:19Z

The BEP leads can meet as-needed to discuss this BEP PR

Coordinate a meeting by emailing Eric Earl: [email protected].

Communicate on this PR to provide feedback otherwise.

BEP036 brings guidelines for best tabular phenotypic data to the BIDS specification.

Includes an appendix called phenotype.md
Includes a new AdditionalValidation key for the dataset_description.json, for which the usage is described in the modality agnostic files sections
Includes the new option to store session_id as the second column in the participants.tsv

Additional Links

Co-authored-by: Eric Earl [email protected] @ericearl
Co-authored-by: Samuel Guay [email protected] @SamGuay
Co-authored-by: Sebastian Urchs [email protected] @surchs
Co-authored-by: Arshitha Basavaraj [email protected] @Arshitha

Upstream PR

Quick update before merging our PR on surchs fork

BEP036 brings guidelines for best tabular phenotypic data to the BIDS specification. - Includes an appendix called `phenotype.md` - Includes admonitions for the guidelines in-line with modality agnostic files sections --------- Co-authored-by: Eric Earl <[email protected]> Co-authored-by: Samuel Guay <[email protected]> Co-authored-by: Sebastian Urchs <[email protected]> Co-authored-by: Arshitha B <[email protected]>

Changed "e.g." to "for example" to follow contributing style guidelines.

for more information, see https://pre-commit.ci

src/modality-agnostic-files/data-summary-files.md

surchs · 2025-05-30T14:57:31Z

src/appendices/phenotype.md

+each `phenotype/<measurement_tool_name>.json` data dictionary.
+This improves reusability and provides clarity about the measurement tool.
+
+### 5. Use the demographics file for common variables about participants


Copying from https://github.com/surchs/bids-specification/pull/1/files#r2103117486

For this section, would it make sense to suggest that demo-like information be prioritized in this file rather than participants.tsv, making the latter primarily a list of subject IDs? I haven't seen this explicitly addressed anywhere, though I'm unsure if it's something we want to formalize 😬
Something like this could follow the paragraph?:

When all demographic data is stored in phenotype/demographics.tsv, participants.tsv may serve primarily as a minimal listing of subject identifiers with only the participant_id column.

I agree. It'd be good to mention this.

src/appendices/phenotype.md

src/modality-agnostic-files/data-summary-files.md

Put the phenotypic and assessment data content where it belongs.

src/modality-agnostic-files/data-summary-files.md

src/modality-agnostic-files/phenotypic-and-assessment-data.md

src/appendices/phenotype.md

Missed a column for the Sessions file: run_id.

Missed the session_id column being 2nd for Phenotype.

- Added in a new guideline 7 to encourage the use of participants and sessions files for different uses. - Re-numbered old guidelines 7-9 to 8-10.

Removing excess line I forgot to remove earlier. Thanks remark CI!

surchs · 2025-09-24T22:20:14Z

.vscode/settings.json

I think this is added by accident in surchs@0eba71d @ericearl

Agreed. I will try to get it out of there.

src/modality-agnostic-files/phenotypic-and-assessment-data.md

src/appendices/phenotype.md

surchs · 2025-09-25T00:49:06Z

src/appendices/phenotype.md

+1.  Each row MUST start with `participant_id`.
+
+1.  Each TSV file MUST contain a `session_id` column when
+    multiple [sessions](../glossary.md#session-entities)[^1] are present
+    in the data set regardless of whether those sessions are in
+    the `phenotype/` data, `sub-<label>/` data, or a combination of the two.
+
+1.  If more than one of the same measurement tool is acquired within
+    the same `session_id`, a `run_id` column MUST be added.
+
+1.  To encode the acquisition time for a measurement tool’s `session_id`,
+    add the `session_id` to the sessions file and
+    include the OPTIONAL `acq_time` column.


Suggested change

1. Each row MUST start with `participant_id`.

1. Each TSV file MUST contain a `session_id` column when

multiple [sessions](../glossary.md#session-entities)[^1] are present

in the data set regardless of whether those sessions are in

the `phenotype/` data, `sub-<label>/` data, or a combination of the two.

1. If more than one of the same measurement tool is acquired within

the same `session_id`, a `run_id` column MUST be added.

1. To encode the acquisition time for a measurement tool’s `session_id`,

add the `session_id` to the sessions file and

include the OPTIONAL `acq_time` column.

a. Each row MUST start with `participant_id`.

b. Each TSV file MUST contain a `session_id` column when

multiple [sessions](../glossary.md#session-entities)[^1] are present

in the data set regardless of whether those sessions are in

the `phenotype/` data, `sub-<label>/` data, or a combination of the two.

c. If more than one of the same measurement tool is acquired within

the same `session_id`, a `run_id` column MUST be added.

d. To encode the acquisition time for a measurement tool’s `session_id`,

add the `session_id` to the sessions file and

include the OPTIONAL `acq_time` column.

Or just regular list? I wouldn't nest one top-level enumeration in another.

surchs · 2025-09-25T02:20:34Z

src/modality-agnostic-files/data-summary-files.md

+Optional: Yes
+
+An aggregated sessions file CAN be provided at the dataset root.


It's not really optional though, right? Or rather, it's only optional in the sense that you could chose the other option and make subject-level sessions.tsv files.

As mentioned above: I would take a stance here and make one of the two options the recommended one. To me that would be the root-level file. And then we can say a word on why we recommend the option.

surchs · 2025-09-25T02:23:00Z

src/modality-agnostic-files/data-summary-files.md

 ## Sessions file

-Template:
+### Option 1: Segregated sessions files


Can't comment on the main heading for Sessions file:

There is a lot of commonality b/w the root-level and subject-level sessions.tsv. i.e. everything about what kind of info should go in them and what they are for. So how about we pull that info up under the heading. And then only explain the differences in the two options sections

surchs · 2025-09-25T02:24:18Z

src/modality-agnostic-files/data-summary-files.md

+`sessions.json` example:
+
+```JSON
+{
+    "participant_id": {
+        "Description": "Participant identifier"
+    },
+    "session_id": {
+        "Description": "Session identifier for the session",
+        "Levels": {
+            "ses-predrug": "session before drug administration",
+            "ses-postdrug": "session after drug administration",
+            "ses-followup": "follow-up session"
+        }
+    },
+    "acq_time": {
+        "Description": "Acquisition time of the session"
+    },
+    "systolic_blood_pressure": {
+        "Description": "Systolic blood pressure measured at the beginning of the session in mmHg"
+    }
+}
+```


as mentioned above: this should go under the Sessions File heading - right now the example table of which columns to put in a sessions.tsv is listed under the "segregated" option, but the data dictionary under the "aggregated" option

src/modality-agnostic-files/data-summary-files.md

src/schema/objects/files.yaml

Accidental file.

Added in easily-agreeable suggestions in a batch. Co-authored-by: Sebastian Urchs <[email protected]>

for more information, see https://pre-commit.ci

src/schema/rules/tabular_data/modality_agnostic.yaml

@surchs

Attempt to address more of @surchs comments.

Thanks for catching that excess newline, remark!

Remove acq_time as a phenotype column recommendation/option, as it should go into the sessions file instead.

src/schema/objects/columns.yaml

Remove acq_time__phenotype from columns.yaml since it was removed from the rest of the schema.

Accept Sebastian's suggestion about the phrasing of guideline 8. Co-authored-by: Sebastian Urchs <[email protected]>

for more information, see https://pre-commit.ci

src/modality-agnostic-files/data-summary-files.md

Changing "subject-level" to "participant-level" in sessions files section.

To better differentiate demographic data from phenotypic data

Made changes to align with final feedback prior to community review.

ericearl · 2025-10-12T13:05:53Z

@effigies @rwblair Here is a blurb for the community review period to make announcements easier. If edits are needed, I will apply them directly to this comment before tomorrow.

Community Review: BEP036 - Phenotypic Data Guidelines

We are pleased to announce the community review period for BIDS Extension Proposal (BEP) 036!

BEP036 extends the BIDS standard to include an appendix with 10 tabular phenotypic data guidelines you can opt into for the BIDS validator. We have developed the extension to allow everyone to follow good practices in preparing their tabular phenotypic data. Additionally, this BEP introduces the ability to include session_id as a second column in participants files and to aggregate sessions files to the root-level, allowing you to store longitudinal tabular data about participants and sessions, respectively, inside those files.

The draft specification may be found at: https://bids-specification--2123.org.readthedocs.build/en/2123/
The proposed changes may be found at bids-specification pull request #2123.
Example datasets may be found under the titles pheno001 through pheno006 in the bids-examples pull request #465.

To view the file differences in either pull request, click the "Files changed" tab.

ericearl and others added 4 commits May 20, 2025 08:24

Merge pull request #2 from bids-standard/master

3cedc86

Upstream PR

Merge pull request #3 from bids-standard/master

11fbb47

Quick update before merging our PR on surchs fork

Update phenotype.md and data-summary-files.md

0a640e6

Changed "e.g." to "for example" to follow contributing style guidelines.

ericearl requested review from effigies and rwblair May 30, 2025 14:09

ericearl assigned surchs, ericearl and SamGuay May 30, 2025

ericearl requested review from DimitriPapadopoulos and erdalkaraca as code owners May 30, 2025 14:09

ericearl added enhancement New feature or request BEP phenotype labels May 30, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

a19512b

for more information, see https://pre-commit.ci

effigies reviewed May 30, 2025

View reviewed changes

src/modality-agnostic-files/data-summary-files.md Show resolved Hide resolved

src/modality-agnostic-files/data-summary-files.md Outdated Show resolved Hide resolved

surchs reviewed May 30, 2025

View reviewed changes