You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.Rmd
+59-3Lines changed: 59 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -22,13 +22,13 @@ knitr::opts_chunk$set(
22
22
23
23
A *data reporting template* is a standardized spreadsheet file (in either xls or xlsx format) used for reporting and processing experimental data. These templates significantly reduce the time required for data analysis and encourage users to present their data in a structured format, minimizing errors and misinterpretations.
24
24
25
-
The **excelDataGuide** package eliminates the need for users to write and maintain complex code for reading data from intricate spreadsheet DRTs. Additionally, it offers a robust framework for validating data, ensuring the correct data types are utilized, and facilitating data wrangling when necessary. This functionality supports *Interoperability* for DRTs, a key aspect of the [FAIR](https://www.go-fair.org/fair-principles/) principles.
25
+
The **excelDataGuide** package eliminates the need for data analysts to write and maintain complex code for reading data from various complex spreadsheet DRTs. Additionally, it offers a robust framework for validating data, ensuring that the correct data types are utilized, and facilitating data wrangling when necessary. This functionality supports *Interoperability* for DRTs, a key aspect of the [FAIR](https://www.go-fair.org/fair-principles/) principles.
26
26
27
27
The package features a user-friendly interface for extracting data from Excel files and converting it into R objects. It accommodates three types of data structures: key-value pairs, tabular data, and microplate-formatted data. The locations of these structures within the Excel template are specified by a **data guide**, which is a YAML file — a structured format that is both human- and machine-readable.
28
28
29
29
## Installation
30
30
31
-
You can install the development version of excelDataGuide from [GitHub](https://github.com/) with:
31
+
You can install the development version of excelDataGuide in a recent version of R from GitHub with:
32
32
33
33
```r
34
34
# install.packages("pak")
@@ -48,6 +48,62 @@ data <- read_data(datafile, guidefile)
48
48
49
49
The output of the `read_data()` function is a list object the format of which is determined for a large part by the design of the data guide.
50
50
51
+
## Details
52
+
53
+
### How it works
54
+
55
+
When you design a template Excel file for data reporting and analysis you also create a *data guide* file that specifies the structure and location of the data in the template. If you design the template carefully you can use the same data guide for several versions of the template. That is, as long as the location of the indexed data does not change, you can use the same data guide for different versions of the template. You can specify the compatible version of the templates in the *data guide*. The package will check compatibility. Clearly, you should use versioned data templates, and hence, a required field in a template is its version number. An example of a template with data is provided in the package (`system.file("extdata", "example_data.xlsx", package = "excelDataGuide")`).
56
+
57
+
Once you have entered the data and metadata in a template you can use the package to extract the data into R. The package will check and coerce the data types to the required formats.
58
+
59
+
### Data guide
60
+
61
+
The *data guide* is a human readable and editable file in [YAML](https://yaml.org/spec/1.2.2/) format that specifies the structure and location of the data in the Excel file. It contains a list of data types, each of which is defined by a name and a set of parameters. As the name suggests, the *data guide* is used by the **excelDataGuide** package as a guide to extract all indexed data from the Excel file and convert it into proper R objects. Part of the *data guide* from the example in the package, *i.e.*`system.file("extdata", "example_guide.yml", package = "excelDataGuide")` is shown below:
62
+
63
+
```yaml
64
+
guide.version: '1.0'
65
+
template.name: competition
66
+
template.min.version: '9.3'
67
+
template.max.version: ~
68
+
plate.format: 96
69
+
locations:
70
+
- sheet: description
71
+
type: cells
72
+
varname: .template
73
+
translate: false
74
+
variables:
75
+
- name: version
76
+
cell: B2
77
+
- sheet: description
78
+
type: keyvalue
79
+
translate: true
80
+
atomicclass:
81
+
- character
82
+
- character
83
+
- character
84
+
- character
85
+
- character
86
+
- date
87
+
- character
88
+
- numeric
89
+
- character
90
+
- numeric
91
+
- character
92
+
- numeric
93
+
- character
94
+
- character
95
+
varname: metadata
96
+
ranges:
97
+
- A10:B21
98
+
- A24:B25
99
+
```
100
+
101
+
We provide a json schema for the data guide, allowing you to check the validity of
102
+
guides that you wrote. The schema is available in the package as
103
+
`system.file("extdata", "excelguide_schema.json", package = "excelDataGuide")`. To
104
+
check its validity against the schema you can use the [Polyglottal JSON Schema Validator](https://www.npmjs.com/package/pajv). More details can be found in the vignette (to be done, see below).
105
+
51
106
## Future work
52
107
53
-
We want to provide guide and template structures for data types without upper size limit, like time series with no pre-determined length.
108
+
- Complete the vignette ([issue](https://github.com/SystemsBioinformatics/excelDataGuide/issues/2))
109
+
- Provide guide and template structures for data types without upper size limit, typically time series with no pre-determined length ([issue](https://github.com/SystemsBioinformatics/excelDataGuide/issues/1)).
0 commit comments