Skip to content

Commit f88e4d6

Browse files
committed
Merge branch 'main' into development
corrected dat coercion in main branch
2 parents 1b14e14 + 2915c44 commit f88e4d6

File tree

8 files changed

+358
-29
lines changed

8 files changed

+358
-29
lines changed

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: excelDataGuide
22
Title: Read Data from templated Excel Data Reports
3-
Version: 0.2.2
3+
Version: 0.2.5
44
Authors@R:
55
person("Douwe", "Molenaar", , "[email protected]", role = c("aut", "cre"),
66
comment = c(ORCID = "0000-0001-7108-4545"))

R/utils.R

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,10 @@ coerce <- function(x, atomicclass) {
112112
"numeric" = as.numeric(x),
113113
"integer" = as.integer(x),
114114
"logical" = as.logical(x),
115-
"date" = as.POSIXct(as.integer(x))
115+
"date" = if (inherits(x, "POSIXct") || inherits(x, "Date")) {
116+
as.Date(x)
117+
} else {
118+
as.Date(as.integer(x), origin="1899-12-30")
119+
},
116120
)
117121
}

README.Rmd

Lines changed: 59 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,13 @@ knitr::opts_chunk$set(
2222

2323
A *data reporting template* is a standardized spreadsheet file (in either xls or xlsx format) used for reporting and processing experimental data. These templates significantly reduce the time required for data analysis and encourage users to present their data in a structured format, minimizing errors and misinterpretations.
2424

25-
The **excelDataGuide** package eliminates the need for users to write and maintain complex code for reading data from intricate spreadsheet DRTs. Additionally, it offers a robust framework for validating data, ensuring the correct data types are utilized, and facilitating data wrangling when necessary. This functionality supports *Interoperability* for DRTs, a key aspect of the [FAIR](https://www.go-fair.org/fair-principles/) principles.
25+
The **excelDataGuide** package eliminates the need for data analysts to write and maintain complex code for reading data from various complex spreadsheet DRTs. Additionally, it offers a robust framework for validating data, ensuring that the correct data types are utilized, and facilitating data wrangling when necessary. This functionality supports *Interoperability* for DRTs, a key aspect of the [FAIR](https://www.go-fair.org/fair-principles/) principles.
2626

2727
The package features a user-friendly interface for extracting data from Excel files and converting it into R objects. It accommodates three types of data structures: key-value pairs, tabular data, and microplate-formatted data. The locations of these structures within the Excel template are specified by a **data guide**, which is a YAML file — a structured format that is both human- and machine-readable.
2828

2929
## Installation
3030

31-
You can install the development version of excelDataGuide from [GitHub](https://github.com/) with:
31+
You can install the development version of excelDataGuide in a recent version of R from GitHub with:
3232

3333
``` r
3434
# install.packages("pak")
@@ -48,6 +48,62 @@ data <- read_data(datafile, guidefile)
4848

4949
The output of the `read_data()` function is a list object the format of which is determined for a large part by the design of the data guide.
5050

51+
## Details
52+
53+
### How it works
54+
55+
When you design a template Excel file for data reporting and analysis you also create a *data guide* file that specifies the structure and location of the data in the template. If you design the template carefully you can use the same data guide for several versions of the template. That is, as long as the location of the indexed data does not change, you can use the same data guide for different versions of the template. You can specify the compatible version of the templates in the *data guide*. The package will check compatibility. Clearly, you should use versioned data templates, and hence, a required field in a template is its version number. An example of a template with data is provided in the package (`system.file("extdata", "example_data.xlsx", package = "excelDataGuide")`).
56+
57+
Once you have entered the data and metadata in a template you can use the package to extract the data into R. The package will check and coerce the data types to the required formats.
58+
59+
### Data guide
60+
61+
The *data guide* is a human readable and editable file in [YAML](https://yaml.org/spec/1.2.2/) format that specifies the structure and location of the data in the Excel file. It contains a list of data types, each of which is defined by a name and a set of parameters. As the name suggests, the *data guide* is used by the **excelDataGuide** package as a guide to extract all indexed data from the Excel file and convert it into proper R objects. Part of the *data guide* from the example in the package, *i.e.* `system.file("extdata", "example_guide.yml", package = "excelDataGuide")` is shown below:
62+
63+
``` yaml
64+
guide.version: '1.0'
65+
template.name: competition
66+
template.min.version: '9.3'
67+
template.max.version: ~
68+
plate.format: 96
69+
locations:
70+
- sheet: description
71+
type: cells
72+
varname: .template
73+
translate: false
74+
variables:
75+
- name: version
76+
cell: B2
77+
- sheet: description
78+
type: keyvalue
79+
translate: true
80+
atomicclass:
81+
- character
82+
- character
83+
- character
84+
- character
85+
- character
86+
- date
87+
- character
88+
- numeric
89+
- character
90+
- numeric
91+
- character
92+
- numeric
93+
- character
94+
- character
95+
varname: metadata
96+
ranges:
97+
- A10:B21
98+
- A24:B25
99+
```
100+
101+
We provide a json schema for the data guide, allowing you to check the validity of
102+
guides that you wrote. The schema is available in the package as
103+
`system.file("extdata", "excelguide_schema.json", package = "excelDataGuide")`. To
104+
check its validity against the schema you can use the [Polyglottal JSON Schema Validator](https://www.npmjs.com/package/pajv). More details can be found in the vignette (to be done, see below).
105+
51106
## Future work
52107

53-
We want to provide guide and template structures for data types without upper size limit, like time series with no pre-determined length.
108+
- Complete the vignette ([issue](https://github.com/SystemsBioinformatics/excelDataGuide/issues/2))
109+
- Provide guide and template structures for data types without upper size limit, typically time series with no pre-determined length ([issue](https://github.com/SystemsBioinformatics/excelDataGuide/issues/1)).

README.md

Lines changed: 93 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,12 @@ experimental data. These templates significantly reduce the time
1616
required for data analysis and encourage users to present their data in
1717
a structured format, minimizing errors and misinterpretations.
1818

19-
The **excelDataGuide** package eliminates the need for users to write
20-
and maintain complex code for reading data from intricate spreadsheet
21-
DRTs. Additionally, it offers a robust framework for validating data,
22-
ensuring the correct data types are utilized, and facilitating data
23-
wrangling when necessary. This functionality supports *Interoperability*
24-
for DRTs, a key aspect of the
19+
The **excelDataGuide** package eliminates the need for data analysts to
20+
write and maintain complex code for reading data from various complex
21+
spreadsheet DRTs. Additionally, it offers a robust framework for
22+
validating data, ensuring that the correct data types are utilized, and
23+
facilitating data wrangling when necessary. This functionality supports
24+
*Interoperability* for DRTs, a key aspect of the
2525
[FAIR](https://www.go-fair.org/fair-principles/) principles.
2626

2727
The package features a user-friendly interface for extracting data from
@@ -33,8 +33,8 @@ a structured format that is both human- and machine-readable.
3333

3434
## Installation
3535

36-
You can install the development version of excelDataGuide from
37-
[GitHub](https://github.com/) with:
36+
You can install the development version of excelDataGuide in a recent
37+
version of R from GitHub with:
3838

3939
``` r
4040
# install.packages("pak")
@@ -57,7 +57,90 @@ data <- read_data(datafile, guidefile)
5757
The output of the `read_data()` function is a list object the format of
5858
which is determined for a large part by the design of the data guide.
5959

60+
## Details
61+
62+
### How it works
63+
64+
When you design a template Excel file for data reporting and analysis
65+
you also create a *data guide* file that specifies the structure and
66+
location of the data in the template. If you design the template
67+
carefully you can use the same data guide for several versions of the
68+
template. That is, as long as the location of the indexed data does not
69+
change, you can use the same data guide for different versions of the
70+
template. You can specify the compatible version of the templates in the
71+
*data guide*. The package will check compatibility. Clearly, you should
72+
use versioned data templates, and hence, a required field in a template
73+
is its version number. An example of a template with data is provided in
74+
the package
75+
(`system.file("extdata", "example_data.xlsx", package = "excelDataGuide")`).
76+
77+
Once you have entered the data and metadata in a template you can use
78+
the package to extract the data into R. The package will check and
79+
coerce the data types to the required formats.
80+
81+
### Data guide
82+
83+
The *data guide* is a human readable and editable file in
84+
[YAML](https://yaml.org/spec/1.2.2/) format that specifies the structure
85+
and location of the data in the Excel file. It contains a list of data
86+
types, each of which is defined by a name and a set of parameters. As
87+
the name suggests, the *data guide* is used by the **excelDataGuide**
88+
package as a guide to extract all indexed data from the Excel file and
89+
convert it into proper R objects. Part of the *data guide* from the
90+
example in the package, *i.e.*
91+
`system.file("extdata", "example_guide.yml", package = "excelDataGuide")`
92+
is shown below:
93+
94+
``` yaml
95+
guide.version: '1.0'
96+
template.name: competition
97+
template.min.version: '9.3'
98+
template.max.version: ~
99+
plate.format: 96
100+
locations:
101+
- sheet: description
102+
type: cells
103+
varname: .template
104+
translate: false
105+
variables:
106+
- name: version
107+
cell: B2
108+
- sheet: description
109+
type: keyvalue
110+
translate: true
111+
atomicclass:
112+
- character
113+
- character
114+
- character
115+
- character
116+
- character
117+
- date
118+
- character
119+
- numeric
120+
- character
121+
- numeric
122+
- character
123+
- numeric
124+
- character
125+
- character
126+
varname: metadata
127+
ranges:
128+
- A10:B21
129+
- A24:B25
130+
```
131+
132+
We provide a json schema for the data guide, allowing you to check the
133+
validity of guides that you wrote. The schema is available in the
134+
package as
135+
`system.file("extdata", "excelguide_schema.json", package = "excelDataGuide")`.
136+
To check its validity against the schema you can use the [Polyglottal
137+
JSON Schema Validator](https://www.npmjs.com/package/pajv). More details
138+
can be found in the vignette (to be done, see below).
139+
60140
## Future work
61141

62-
We want to provide guide and template structures for data types without
63-
upper size limit, like time series with no pre-determined length.
142+
- Complete the vignette
143+
([issue](https://github.com/SystemsBioinformatics/excelDataGuide/issues/2))
144+
- Provide guide and template structures for data types without upper
145+
size limit, typically time series with no pre-determined length
146+
([issue](https://github.com/SystemsBioinformatics/excelDataGuide/issues/1)).

inst/extdata/example_guide.yml

Lines changed: 29 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,26 +14,41 @@ locations:
1414
- sheet: description
1515
type: keyvalue
1616
translate: true
17+
atomicclass:
18+
- character
19+
- character
20+
- character
21+
- character
22+
- character
23+
- date
24+
- character
25+
- numeric
26+
- character
27+
- numeric
28+
- character
29+
- numeric
30+
- character
31+
- character
1732
varname: metadata
1833
ranges:
19-
- A10:B14
20-
- A16:B16
21-
- A18:B18
22-
- A20:B20
34+
- A10:B21
2335
- A24:B25
24-
- sheet: description
25-
type: keyvalue
26-
translate: true
27-
atomicclass: numeric
28-
varname: metadata
29-
ranges:
30-
- A15:B15
31-
- A17:B17
32-
- A19:B19
33-
- A21:B21
3436
- sheet: _data
3537
type: platedata
3638
translate: false
39+
atomicclass:
40+
- character
41+
- integer
42+
- character
43+
- numeric
44+
- numeric
45+
- numeric
46+
- numeric
47+
- numeric
48+
- numeric
49+
- character
50+
- character
51+
- character
3752
varname: plate
3853
ranges:
3954
- A1:M9

0 commit comments

Comments
 (0)