Skip to content

Commit 54f720e

Browse files
committed
update readme and provide json schema
1 parent 3ac5ea7 commit 54f720e

File tree

4 files changed

+386
-257
lines changed

4 files changed

+386
-257
lines changed

README.Rmd

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,11 +50,17 @@ The output of the `read_data()` function is a list object the format of which is
5050

5151
## Details
5252

53+
### How it works
54+
55+
When you design a template Excel file for data reporting and analysis you also create a *data guide* file that specifies the structure and location of the data in the template. If you design the template carefully you can use the same data guide for several versions of the template. That is, as long as the location of the indexed data does not change, you can use the same data guide for different versions of the template. You can specify the compatible version of the templates in the *data guide*. The package will check compatibility. Clearly, you should use versioned data templates, and hence, a required field in a template is its version number. An example of a template with data is provided in the package (`system.file("extdata", "example_data.xlsx", package = "excelDataGuide")`).
56+
57+
Once you have entered the data and metadata in a template you can use the package to extract the data into R. The package will check and coerce the data types to the required formats.
58+
5359
### Data guide
5460

55-
The *data guide* is a human readable and editable file in [YAML](https://yaml.org/spec/1.2.2/) format that specifies the structure and location of the data in the Excel file. It contains a list of data types, each of which is defined by a name and a set of parameters. As the name suggests, the *data guide* is used by the **excelDataGuide** package as a guide to extract all indexed data from the Excel file and convert it into proper R objects. An example of part of a *data guide* is shown below:
61+
The *data guide* is a human readable and editable file in [YAML](https://yaml.org/spec/1.2.2/) format that specifies the structure and location of the data in the Excel file. It contains a list of data types, each of which is defined by a name and a set of parameters. As the name suggests, the *data guide* is used by the **excelDataGuide** package as a guide to extract all indexed data from the Excel file and convert it into proper R objects. Part of the *data guide* from the example in the package, *i.e.* `system.file("extdata", "example_guide.yml", package = "excelDataGuide")` is shown below:
5662

57-
```
63+
``` yaml
5864
guide.version: '1.0'
5965
template.name: competition
6066
template.min.version: '9.3'
@@ -92,6 +98,11 @@ locations:
9298
- A24:B25
9399
```
94100
101+
We provide a json schema for the data guide, allowing you to check the validity of
102+
guides that you wrote. The schema is available in the package as
103+
`system.file("extdata", "excelguide_schema.json", package = "excelDataGuide")`. To
104+
check its validity against the schema you can use the [Polyglottal JSON Schema Validator](https://www.npmjs.com/package/pajv).
105+
95106
## Future work
96107

97108
We want to provide guide and template structures for data types without upper size limit, typically time series with no pre-determined length.

README.md

Lines changed: 81 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,12 @@ experimental data. These templates significantly reduce the time
1616
required for data analysis and encourage users to present their data in
1717
a structured format, minimizing errors and misinterpretations.
1818

19-
The **excelDataGuide** package eliminates the need for users to write
20-
and maintain complex code for reading data from intricate spreadsheet
21-
DRTs. Additionally, it offers a robust framework for validating data,
22-
ensuring the correct data types are utilized, and facilitating data
23-
wrangling when necessary. This functionality supports *Interoperability*
24-
for DRTs, a key aspect of the
19+
The **excelDataGuide** package eliminates the need for data analysts to
20+
write and maintain complex code for reading data from various complex
21+
spreadsheet DRTs. Additionally, it offers a robust framework for
22+
validating data, ensuring that the correct data types are utilized, and
23+
facilitating data wrangling when necessary. This functionality supports
24+
*Interoperability* for DRTs, a key aspect of the
2525
[FAIR](https://www.go-fair.org/fair-principles/) principles.
2626

2727
The package features a user-friendly interface for extracting data from
@@ -33,8 +33,8 @@ a structured format that is both human- and machine-readable.
3333

3434
## Installation
3535

36-
You can install the development version of excelDataGuide from
37-
[GitHub](https://github.com/) with:
36+
You can install the development version of excelDataGuide in a recent
37+
version of R from GitHub with:
3838

3939
``` r
4040
# install.packages("pak")
@@ -57,7 +57,79 @@ data <- read_data(datafile, guidefile)
5757
The output of the `read_data()` function is a list object the format of
5858
which is determined for a large part by the design of the data guide.
5959

60+
## Details
61+
62+
### How it works
63+
64+
When you design a template Excel file for data reporting and analysis
65+
you also create a *data guide* file that specifies the structure and
66+
location of the data in the template. If you design the template
67+
carefully you can use the same data guide for several versions of the
68+
template. That is, as long as the location of the indexed data does not
69+
change, you can use the same data guide for different versions of the
70+
template. You can specify the compatible version of the templates in the
71+
*data guide*. The package will check compatibility. Clearly, you should
72+
use versioned data templates, and hence, a required field in a template
73+
is its version number. An example of a template with data is provided in
74+
the package
75+
(`system.file("extdata", "example_data.xlsx", package = "excelDataGuide")`).
76+
77+
Once you have entered the data and metadata in a template you can use
78+
the package to extract the data into R. The package will check and
79+
coerce the data types to the required formats.
80+
81+
### Data guide
82+
83+
The *data guide* is a human readable and editable file in
84+
[YAML](https://yaml.org/spec/1.2.2/) format that specifies the structure
85+
and location of the data in the Excel file. It contains a list of data
86+
types, each of which is defined by a name and a set of parameters. As
87+
the name suggests, the *data guide* is used by the **excelDataGuide**
88+
package as a guide to extract all indexed data from the Excel file and
89+
convert it into proper R objects. Part of the *data guide* from the
90+
example in the package, *i.e.*
91+
`system.file("extdata", "example_guide.yml", package = "excelDataGuide")`
92+
is shown below:
93+
94+
``` yaml
95+
guide.version: '1.0'
96+
template.name: competition
97+
template.min.version: '9.3'
98+
template.max.version: ~
99+
plate.format: 96
100+
locations:
101+
- sheet: description
102+
type: cells
103+
varname: .template
104+
translate: false
105+
variables:
106+
- name: version
107+
cell: B2
108+
- sheet: description
109+
type: keyvalue
110+
translate: true
111+
atomicclass:
112+
- character
113+
- character
114+
- character
115+
- character
116+
- character
117+
- date
118+
- character
119+
- numeric
120+
- character
121+
- numeric
122+
- character
123+
- numeric
124+
- character
125+
- character
126+
varname: metadata
127+
ranges:
128+
- A10:B21
129+
- A24:B25
130+
```
131+
60132
## Future work
61133
62134
We want to provide guide and template structures for data types without
63-
upper size limit, like time series with no pre-determined length.
135+
upper size limit, typically time series with no pre-determined length.

0 commit comments

Comments
 (0)