-
Notifications
You must be signed in to change notification settings - Fork 188
[ENH] BEP036 - Phenotypic Data Guidelines #2123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 6 commits
3cedc86
11fbb47
0ef9fdf
0a640e6
a19512b
5718888
8f54e94
94cb476
142c460
8b78359
60f712a
e62b5cc
ac097aa
32fedd0
aacda9b
fd5ff2d
dd65b5e
f4205e8
0eba71d
d3631a8
f4939ad
abd5c2b
ec2c53d
7639001
8b38859
d1141a0
6c6ee8b
9f8afec
8fa89bc
ff86669
ede68ef
e8ab5dd
3490e9d
d02e0bf
f8d6333
bd083c0
ec2703b
00d8f25
41f0f70
cdfc0d2
6cbb4ee
fe3ddab
40f6751
2fd12d7
a0cab8b
c0bd78a
97917f0
80683f6
76932fe
32f994e
3a602ca
f8d492e
7f1eb09
5c55eb9
c0951f3
69183c7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,331 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||
# Tabular phenotypic data guidelines | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
This appendix is a collection of guidelines and examples for creating well-organized aggregated tabular phenotypic data. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
## Guidelines | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
These guidelines are all **RECOMMENDED** when preparing | ||||||||||||||||||||||||||||||||||||||||||||||||||||
tabular phenotypic data like the | ||||||||||||||||||||||||||||||||||||||||||||||||||||
participants file, sessions file, demographics file, | ||||||||||||||||||||||||||||||||||||||||||||||||||||
or phenotypic and assessment data. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
The language below uses REQUIRED, MUST, and others to imply | ||||||||||||||||||||||||||||||||||||||||||||||||||||
these are the requirements for these **RECOMMENDED** guidelines. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
### 1. Always pair tabular data with data dictionaries | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
Tabular phenotypic data MUST be prepared as one pair of a tabular file | ||||||||||||||||||||||||||||||||||||||||||||||||||||
in tab-separated value (TSV) format and a corresponding data dictionary | ||||||||||||||||||||||||||||||||||||||||||||||||||||
in JavaScript Object Notation (JSON) format. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
### 2. Aggregate data across sessions | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
Aggregation refers to the contents of the TSV file. It is REQUIRED | ||||||||||||||||||||||||||||||||||||||||||||||||||||
to collect all participant data into one TSV per tabular phenotypic file. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
### 3. Ensure minimal annotation for phenotypic and assessment data | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
In phenotypic and assessment data each measurement tool has an independent | ||||||||||||||||||||||||||||||||||||||||||||||||||||
aggregated data TSV file in which the user collects all subjects, sessions, | ||||||||||||||||||||||||||||||||||||||||||||||||||||
and/or runs of data as one entry per row (with a row defined by | ||||||||||||||||||||||||||||||||||||||||||||||||||||
the smallest unit of acquisition). In other words: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
1. Each row MUST start with `participant_id`. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
2. Each TSV file MUST contain a `session_id` column when | ||||||||||||||||||||||||||||||||||||||||||||||||||||
multiple [sessions](../glossary.md#session-entities)[^1] are present | ||||||||||||||||||||||||||||||||||||||||||||||||||||
in the data set regardless of whether those sessions are in | ||||||||||||||||||||||||||||||||||||||||||||||||||||
the `phenotype/` data, `sub-<label>/` data, or a combination of the two. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
3. If more than one of the same measurement tool is acquired within | ||||||||||||||||||||||||||||||||||||||||||||||||||||
the same `session_id`, a `run` column MUST be added. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
4. To encode the acquisition time for a measurement tool’s `session_id`, | ||||||||||||||||||||||||||||||||||||||||||||||||||||
add the `session_id` to the sessions file and | ||||||||||||||||||||||||||||||||||||||||||||||||||||
include the OPTIONAL `acq_time` column. | ||||||||||||||||||||||||||||||||||||||||||||||||||||
ericearl marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
To summarize this guideline as a table: | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| **Column name** | **Requirement** | **Description** | | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| :--------------- | :-------------- | :-------------- | | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| `participant_id` | REQUIRED | MUST be the first column in the file. Note that data for one participant MAY be represented across multiple rows in case of multiple sessions or runs, and therefore the entry in the `participant_id` column will be repeated. | | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| `session _id` | CONDITIONAL ; If sessions are defined in the dataset | A `session_id` column MUST be added to all tabular files in the phenotype directory as soon as multiple sessions are present in the data set regardless of whether those sessions are in the `phenotype/` data, `sub-<label>/` data, or a combination of the two. | | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| `run` | CONDITIONAL ; If there are multiple runs within any session | A chronological `run` number is used when a measurement tool or assessment described by a tabular file was repeated within a session. | | ||||||||||||||||||||||||||||||||||||||||||||||||||||
| `acq_time` | OPTIONAL | If acquisition time is available, the `acq_time` column CAN be used to record the time of acquisition of each row in the tabular file. | | ||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Participants: | |
selectors: | |
- path == "/participants.tsv" | |
initial_columns: | |
- participant_id | |
columns: | |
participant_id: | |
level: required | |
description_addendum: | | |
There MUST be exactly one row for each participant. | |
species: recommended | |
age: recommended | |
sex: recommended | |
handedness: recommended | |
strain: recommended | |
strain_rrid: recommended | |
index_columns: [participant_id] | |
additional_columns: allowed |
bids-specification/src/modality-agnostic-files/data-summary-files.md
Lines 22 to 28 in ac11483
<!-- This block generates a columns table. | |
The definitions of these fields can be found in | |
src/schema/rules/tabular_data/*.yaml | |
and a guide for using macros can be found at | |
https://github.com/bids-standard/bids-specification/blob/master/macros_doc.md | |
--> | |
{{ MACROS___make_columns_table("modality_agnostic.Participants") }} |
ericearl marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying from https://github.com/surchs/bids-specification/pull/1/files#r2103117486
For this section, would it make sense to suggest that demo-like information be prioritized in this file rather than participants.tsv
, making the latter primarily a list of subject IDs? I haven't seen this explicitly addressed anywhere, though I'm unsure if it's something we want to formalize 😬
Something like this could follow the paragraph?:
When all demographic data is stored in
phenotype/demographics.tsv
,participants.tsv
may serve primarily as a minimal listing of subject identifiers with only theparticipant_id
column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It'd be good to mention this.
ericearl marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this, but I find it a bit confusingly worded.
store the time of acquisition[^2] of each row inside a column named
acq_time
in the sessions file.
Essentially what we're saying is: please record the acq_time
for all sessions. And when you do, put that in the sessions.tsv
Uh oh!
There was an error while loading. Please reload this page.