-
Notifications
You must be signed in to change notification settings - Fork 188
[ENH] BEP036 - Phenotypic Data Guidelines #2123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 28 commits
3cedc86
11fbb47
0ef9fdf
0a640e6
a19512b
5718888
8f54e94
94cb476
142c460
8b78359
60f712a
e62b5cc
ac097aa
32fedd0
aacda9b
fd5ff2d
dd65b5e
f4205e8
0eba71d
d3631a8
f4939ad
abd5c2b
ec2c53d
7639001
8b38859
d1141a0
6c6ee8b
9f8afec
8fa89bc
ff86669
ede68ef
e8ab5dd
3490e9d
d02e0bf
f8d6333
bd083c0
ec2703b
00d8f25
41f0f70
cdfc0d2
6cbb4ee
fe3ddab
40f6751
2fd12d7
a0cab8b
c0bd78a
97917f0
80683f6
76932fe
32f994e
3a602ca
f8d492e
7f1eb09
5c55eb9
c0951f3
69183c7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
{ | ||
"githubPullRequests.ignoredPullRequestBranches": [ | ||
"master" | ||
] | ||
} |
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,332 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
# Tabular phenotypic data guidelines | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
This appendix is a collection of guidelines and examples | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
for creating well-organized aggregated tabular phenotypic data. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
ericearl marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
## Guidelines | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
These guidelines all apply when the | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
[`AdditionalValidation` key](../modality-agnostic-files/dataset-description.md#additional-validation) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
contains `"Phenotype"` in the `dataset_description.json`. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
They are intended to improve the organization and clarity of | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
tabular phenotypic data like the participants file, sessions file, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
and phenotypic and assessment data. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
ericearl marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
### 1. Aggregate data across sessions | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Aggregation refers to the contents of the TSV file. It is REQUIRED | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
to collect all participant data into one TSV per tabular phenotypic file. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
ericearl marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
### 2. Always pair tabular data with data dictionaries | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Tabular phenotypic data MUST be prepared as one pair of a tabular file | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
in tab-separated value (TSV) format and a corresponding data dictionary | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
in JavaScript Object Notation (JSON) format. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
ericearl marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
### 3. Add `MeasurementToolMetadata` to each tabular phenotypic measurement tool | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Whenever possible, it is RECOMMENDED to add `MeasurementToolMetadata` to | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
each `phenotype/<measurement_tool_name>.json` data dictionary. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
This improves reusability and provides clarity about the measurement tool. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
ericearl marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
### 4. Ensure minimal annotation for phenotypic and assessment data | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
In phenotypic and assessment data each measurement tool SHOULD have an independent | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
aggregated data TSV file in which the user collects all subjects, sessions, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
and/or runs of data as one entry per row (with a row defined by | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
the smallest unit of acquisition). In other words: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
1. Each row MUST start with `participant_id`. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
1. Each TSV file MUST contain a `session_id` column when | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
multiple [sessions](../glossary.md#session-entities)[^1] are present | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
in the data set regardless of whether those sessions are in | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
the `phenotype/` data, `sub-<label>/` data, or a combination of the two. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
1. If more than one of the same measurement tool is acquired within | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
the same `session_id`, a `run_id` column MUST be added. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||
1. To encode the acquisition time for a measurement tool’s `session_id`, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
add the `session_id` to the sessions file and | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
include the OPTIONAL `acq_time` column. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
1. Each row MUST start with `participant_id`. | |
1. Each TSV file MUST contain a `session_id` column when | |
multiple [sessions](../glossary.md#session-entities)[^1] are present | |
in the data set regardless of whether those sessions are in | |
the `phenotype/` data, `sub-<label>/` data, or a combination of the two. | |
1. If more than one of the same measurement tool is acquired within | |
the same `session_id`, a `run_id` column MUST be added. | |
1. To encode the acquisition time for a measurement tool’s `session_id`, | |
add the `session_id` to the sessions file and | |
include the OPTIONAL `acq_time` column. | |
a. Each row MUST start with `participant_id`. | |
b. Each TSV file MUST contain a `session_id` column when | |
multiple [sessions](../glossary.md#session-entities)[^1] are present | |
in the data set regardless of whether those sessions are in | |
the `phenotype/` data, `sub-<label>/` data, or a combination of the two. | |
c. If more than one of the same measurement tool is acquired within | |
the same `session_id`, a `run_id` column MUST be added. | |
d. To encode the acquisition time for a measurement tool’s `session_id`, | |
add the `session_id` to the sessions file and | |
include the OPTIONAL `acq_time` column. |
Or just regular list? I wouldn't nest one top-level enumeration in another.
ericearl marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say this does not have to be a sub-heading. How about just having the table here with no further context? Or if context is needed, just plaintext.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is sufficiently clear from the table - we could add a point on run_id
in the bullet points above the table. If you think more context would be good here, how about an example instead? The text is making me double take a bit, so I'd prefer to remove
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we turn this into a general section on what is "demographic" information (and thus goes into participants.tsv
and what is "phenotypic" information and thus goes into /phenotype
? Now that participants.tsv
supports session_id
(and thus multi-row per participant), this would be good to make very clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to point 7 essentially
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how to squeeze it in explicitly without an 11th guideline. Do you have any suggestions you can make directly in a commit or a PR comment suggestion?
ericearl marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this, but I find it a bit confusingly worded.
store the time of acquisition[^2] of each row inside a column named
acq_time
in the sessions file.
Essentially what we're saying is: please record the acq_time
for all sessions. And when you do, put that in the sessions.tsv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is added by accident in surchs@0eba71d @ericearl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I will try to get it out of there.