You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<strong>External table editor</strong>. Here, the user has right-clicked on the table header and is transforming the contents to Age entries.
10
+
<strong>External table editor</strong>. Users should work through the columns one by one. Successfully transformed columns are shown in green. The follow sections explain how to use the tool.
11
11
</figcaption>
12
12
</figure>
13
13
14
14
15
-
## Functionality
15
+
## Excel import
16
+
Clinical data about cohorts is often provided in tabular form in the medical genetics literature. Often, supplemental tables with clinical data about the cohort are provided as Excel files. Use the *Excel (Cols)* button to import Excel files in which
17
+
data for each individual is represented by a column. Use *Excel (Rows)* button to import Excel files in which
18
+
data for each individual is represented by a row.
16
19
17
-
The functions of phenoboard can be explored by right-clicking on column headers or cells.
20
+
> **Warning**
21
+
> Some excel files encountered in the literature can be difficult to parse. For instance, sometimes there are formatting errors such that the contents of one cell "spills over" into the next row. Sometimes, information about two different items
22
+
is presented in the same cell (e.g., Age and Sex). Phenoboard provides functionality to merge and split cells that may help to
23
+
deal with this, but it may be easier to manually edit the excel file in some cases.
18
24
19
-
## Saving
25
+
The Excel import buttons are disactivated if there is no current cohort. Thus, if you are trying to create a cohort from scratch based on an Excel file, you first need to create the cohort data using the [New Cohort](newcohort.md) page.
20
26
21
-
When all columns have been processed, the user can add all rows to the current cohort (which must be previously entered!).
27
+
## Saving and Loading ETL files
28
+
29
+
If you would like to save your work and come back later to finish, use the *Save ETL* button to save the file that contains the current state of the ETL file. Use the *Load ETL* button to load the file and resume work where you left off.
30
+
31
+
> **ETL**: Extract transform load
32
+
> ETL is a three-phase computing process where data is extracted from an input source, transformed (including cleaning), and loaded into an output data container. In our case, we are extracting data from the original Excel file, transforming it into HPO terms and other data required for the phenopacket, and loading the transformed data into the Cohort data structure that phenoboard uses to store data about cohorts of individuals (phenopackets).
33
+
34
+
## PMID
35
+
36
+
Assign the PubMed identifier of the article from which the cohort data was taken.
37
+
38
+
## Add to cohort
39
+
40
+
When you are finished transforming each column, the next step is to load the data into the current cohort. This is not
41
+
possible unless all columns have been transformed or marked as "Ignored" (except for the HPO Text Mining column, which is optional).
42
+
43
+
## Transforming columns
44
+
Phenoboard offers a number of different functions for transforming columns, each of which can be started from the context menu that appears upon right click. The following sections describe the major functionalities. Many of the dialogs also provide help that can be accessed by clicking on the **?** symbol.
45
+
46
+
### Demographic information
47
+
Each individual can be annotated with an individual identifier, sex, age of onset, age at last encounter, deceased status. Optionally, a column with a family identifier can be marked and merged with the individual identifier column. To do this, right click on the column header and navegate the context menu as shown below.
This entry must be unique in the cohort. For instance, it would be an error to have two rows with the identifier "Individual A".
59
+
60
+
### Sex column
61
+
This column is used to specify the biological sex of the individual. A variety of formats are transformed into the phenopacket standard entries:
62
+
-**M**: Male
63
+
-**F**: Female
64
+
-**O**: Other
65
+
-**U**: Unknown (or not recorded, not available)
66
+
67
+
### Age of onset/Age at last encounter
68
+
Enter the age at first manifestation of any clinical manifestation related to the disease (Onset) and the age at which the individual was last medically examined (last encounter). The ingest will try to transform the data in the column into a Gestational age, HPO term, or ISO 8601 string as appropriate. It is common to see that the age is provide in years (just the number without "y" or "years"). In this case, use the "Assume years" option to ingest the data.
69
+
70
+
### Deceased
71
+
The data in this column must be in the phenopacket format
72
+
-**yes**: deceased
73
+
-**no**: alive
74
+
-**na**: unknown/not available
75
+
76
+
Further information about these fields can be found in the [Phenopacket Schema documentation](https://phenopacket-schema.readthedocs.io/en/latest/).
0 commit comments