Skip to content

Commit c3c0435

Browse files
authored
Merge pull request #229 from NESCent/paper-edits-20190325
Paper edits
2 parents b203bb0 + ca085d3 commit c3c0435

File tree

1 file changed

+3
-11
lines changed

1 file changed

+3
-11
lines changed

paper.md

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -19,19 +19,13 @@ bibliography: paper.bib
1919

2020
# Summary
2121

22-
TraitDB is a web application built to facilitate storage, searching, subsetting, and sharing trait data from multiple sources.
23-
24-
Many software packages are available to store structured and unstructured relational data alike. From graphical spreadsheets to query-powered database engines, these software excel at the direct ingesting, storing, and vending homogenous datasets.
25-
26-
When synthesizing multiple heterogenous datasets together, these software can still serve as vital infrastructure, but by design do not enforce requirements on organization, categorization, or uniformity.
27-
28-
As a result, well-intentioned efforts to maintain data integrity can easily fall short in the face of building complicated validation and transformation routines to fit datasets into a common schema.
22+
TraitDB is a web application that synthesizes phenotypic trait datasets from varying structures and formats. It organizes data into a consistent format and taxonomy, facilitating querying, subsetting, and sharing over a range of user criteria. While many other software packages exist to store structured and unstructured relational data, TraitDB is designed to meet the needs of research teams by fitting into their data gathering practices. Commonly-used packages like spreadsheets and SQL databases may excel at direct ingest, storage, and management of homogeneous datasets, but fall short dealing with real-world heterogenous data structures and formats. They require dedicated effort to build and adapt complex processes in order to maintain data integrity.
2923

3024
Through work with multiple [NESCent](http://nescent.org/) working groups, we observed and supported real-world efforts to build a database of traits by collecting and organizing thousands of trait data observations. We evaluated software including [OpenRefine](http://openrefine.org/), [SQLite Manager](https://addons.mozilla.org/en-US/firefox/addon/sqlite-manager-webext/), and [mx](https://github.com/mx3/mx) that aim to address many of these challenges.
3125

32-
While these software are technically proficient can facilitate robust operations, we found that individual researchers were most productive tabulating data from specimens or literature into local spreadsheets rather than an online application. At this level, they needed frictionless data entry of the records they had available, rather than tedious interactions. After collecting their data, they needed a robust process to validate, reconcile, and ingest records into the larger body.
26+
While these software are capable of enforcing complex data models, we found that individual researchers were most productive tabulating data from specimens or literature into local spreadsheets rather than an online application. At this level, they needed frictionless data entry of the records they had available, rather than tedious interactions. After collecting their data, they needed a robust process to validate, reconcile, and ingest records into the larger body.
3327

34-
Through collaboration, and iteration, we designed and built TraitDB to serve the use cases of these researchers. Primarily, TraitDB aids the ingest of datasets from different researchers working to synthesize a single, uniform dataset. Data ingest is driven by a flexible YAML template system, managed by the group. The process provides validation and cell-level feedback on input CSV datasets. It indicates which fields are required, warns of duplicate records, and performs validation on categorical and continuous variables. Prior to ingesting the data, TraitDB provides a structured report of warnings and requires that any errors are corrected.
28+
Through collaboration, and iteration, we designed and built TraitDB to serve the use cases of these researchers. Primarily, TraitDB aids ingest of datasets from different researchers working to synthesize a single, uniform dataset. Data ingest is driven by a flexible YAML template system, managed by the group of researchers. The process provides validation and observation-level feedback on input CSV datasets. It indicates which fields are required, warns of duplicate records, and performs validation on categorical and continuous variables. Prior to ingesting the data, TraitDB provides a structured report of warnings and requires that any errors are corrected.
3529

3630
Once ingested, TraitDB provides several mechanisms for querying, browsing, subsetting, and fetching data via its web interface. It implements simple summary calculations of trait data over taxonomic levels, and provides CSV download of synthesized datasets for further analysis.
3731

@@ -40,5 +34,3 @@ TraitDB has been used to gather and produce datasets hosted by the [Tree of Sex]
4034
# Acknowledgements
4135

4236
TraitDB is a project of [National Evolutionary Synthesis Center](http://nescent.org/)
43-
44-
# References

0 commit comments

Comments
 (0)