Skip to content

Commit a5cee54

Browse files
committed
update of documentation
1 parent b64e3a1 commit a5cee54

File tree

1 file changed

+96
-104
lines changed

1 file changed

+96
-104
lines changed

vignettes/writing_templates_and_data_guides.Rmd

Lines changed: 96 additions & 104 deletions
Original file line numberDiff line numberDiff line change
@@ -20,42 +20,43 @@ library(excelDataGuide)
2020

2121
## Introduction
2222

23-
Spreadsheets are a widely used tool in the biochemical laboratory, both to
24-
record and to analyze experiments. When such experiments become a routine we
25-
often create spreadsheet templates to save time and to structure our work. When
26-
analyzing a lot of these experiments switching to a scripting language like R or
27-
Python for analysis will become useful. Also in these cases the spreadsheet template
28-
is a useful way to structure the recording of experimental data and metadata.
29-
30-
The goal of the excelDataGuide package is to be able to use both Excel-compliant
31-
spreadsheets and scripts as data analysis tools. Clearly, a scripting language
23+
The spreadsheet is a widely used tool in the biochemical laboratory, both for
24+
recording and analyzing experiments. When such experiments become a routine we
25+
often create spreadsheet templates to save time and to structure our work.
26+
27+
The goal of the excelDataGuide package is to be able to use Excel spreadsheets
28+
as well as scripts as data analysis tools. Clearly, a scripting language
3229
has more potential when it comes to analyzing large data sets, consisting of
3330
multiple notebooks.
3431

3532
Importantly, **the source of all data is the spreadsheet.** This concerns
3633
metadata, parameters like acceptance criteria, concentrations and measured data.
37-
This guarantees that calculations in the spreadsheet and in the scripts are all
38-
based on the same underlying data.
39-
40-
Concerning calculated data it may or may not be useful to let the spreadsheet be
41-
the source of such data for the script as well. This may be particularly useful
42-
when it concerns calculations that are carried out automatically upon entry
43-
of data by the user.
34+
This *one-source* policy guarantees that calculations in the spreadsheet and in
35+
the scripts are all based on the same underlying data and parameters.
4436

45-
Part of these data, like acceptance criteria, is determined in the SOP and fixed
37+
Parameters, like acceptance criteria, are determined in the SOP and fixed
4638
in the spreadsheet template, whereas other data may vary per experiment and is
4739
entered by the user. For example, when a user performs parameter fitting, it may
4840
be useful to compare the fitted parameters to those obtained in another
4941
programming environment.
5042

51-
## Writing a template
43+
Concerning calculated data it may or may not be useful to let the spreadsheet be
44+
the source of such data for the script as well. This may be particularly useful
45+
when it concerns calculations that are carried out automatically upon entry
46+
of data by the user.
47+
48+
## Structuring a template
5249

5350
Below is an example of the front page of a template (of the fitc-t4 TTR assay),
5451
illustrating a number of ideas and concepts that we discuss below.
5552

5653
![front page](images/template_frontpage.png){width=100%}
5754

58-
### A template has a version number
55+
### A template must have a version number
56+
57+
Unique template version numbers are a way to prevent misunderstandings
58+
between users and are also needed here to check whether a data guide is
59+
compatible with the template version.
5960

6061
#### Version numbering rules
6162

@@ -71,43 +72,62 @@ is recorded should be *text*, and not *general* or *number*
7172

7273
#### A template name is optional
7374

74-
Preferably, a template also has a name. Note that the example above doesn't have
75-
a name.
75+
Preferably, a template also has a name. Note that the example in the figure
76+
above doesn't have a name.
7677

7778
#### Checking compatibilty of template versions and a guide version
7879

79-
We use template version numbers to check compatibility with a guide. That is
80-
because the same guide could in principle be used for multiple versions of a
81-
template, for example because only explanatory texts or calculations have
82-
changed but not locations of data. When checking version compatibility we
80+
We use template version numbers to check compatibility with a guide. In principle
81+
the same guide can be used for multiple versions of a template as long as the
82+
locations and names of variables indexed in the guide did not change. This is the
83+
case when, for example, only explanatory texts or calculations or data validity
84+
checks have changed in the template. When checking version compatibility we
8385
assume that a guide is compatible with a consecutive range of template versions
8486
between a minimal and a maximal version number.
8587

8688
### All cells are protected except those for data entry
8789

88-
Data entry cells have a distinct background color
90+
Data entry cells have a distinct background color, here "marker yellow". All
91+
other cells have protected status to prevent users from inadvertently changing
92+
them.
8993

9094
### Include comments
9195

9296
Refer to the SOP+ version
9397

9498
### Built-in data entry checks
9599

100+
The validity of data entered by the users should be checked by validity checks,
101+
especially when misunderstandings are likely to happen. The validity checking
102+
capability by excel is limited. In cases where the data structure can not be
103+
properly described by a validity rule we add a comment next to the cell in which
104+
the data is entered.
105+
96106
### A single source of parameters
97107

98108
![The parameters as key-value pairs](images/parameters.png){width=35%}
99109

110+
Parameters needed for calculations, for example for acceptance criteria of
111+
measurements are best entered on a separate sheet, and referred to by absolute
112+
references in calculations. In the case of the example we have a separate
113+
hidden sheet called *_parameters* for this purpose. The information in this
114+
sheet is indexed in the data guide, and therefore available to R-scripts as
115+
well.
116+
100117
### Use of hidden worksheets for data transfer
101118

102119
![A hidden sheet with links to plate-formetted data](images/data.png){width=100%}
103120

121+
122+
104123
## What else?
105124

106125
To facilitate automatic reading from the spreadsheet by scripts data must be
107126
in either of these four formats:
108127

109128
- **keyvalue** format. Here, the key and value are placed in horizontally
110-
adjacent cells (columns). The key is to be used as the parameter name in the
129+
adjacent cells (columns). The key, or its translated short name (see below)
130+
is to be used as the parameter name in the
111131
scripts and should conform to variable naming rules for the scripting language
112132
used. The key is found in the left-most cell of a cell range. The value can be a
113133
single value (one cell) or a vector of values (multiple cells).
@@ -127,25 +147,22 @@ data guide.
127147

128148

129149
The keyvalue format will be mostly used for metadata and parameters. All keyvalue
130-
will be aggregated in a single named list caled "keyvalue".
150+
will be aggregated in a single named list called "keyvalue".
131151

132152
The platedata format will be used for measured data and data concerning
133153
concentrations in the plate wells. All ranges will be aggregated in a single
134154
data frame with reported variables as column names, including the column names
135155
"row" and "col", corresponding to the row and column names of the plate.
136156

137-
Clearly, to make sure that calculations made in the spreadsheet and in the
138-
script use the same values, the spreadsheet should use parameter values *etc.*
139-
by the (preferably absolute) cell-reference mechanism, whereas the script should
140-
use these values by reference to their variable names.
157+
## Constructing a guide
141158

142-
Every spreadsheet template should be accompanied by a guide indicating the
143-
sheets and ranges in which keyvalue and platedata formatted data are to be found
144-
in the filled out template. This guide is a yaml file.
159+
Every spreadsheet template should be accompanied by a data guide, and index
160+
registering the location of different data structures in the template. This
161+
guide is a yaml file, a human editable and computer readable file format.
145162

146-
This guide is structured as follows:
163+
Below is an example of the first rows of a data guide:
147164

148-
```{yaml}
165+
``` yaml
149166
guide.version: '1.0'
150167
template.name: competition
151168
template.min.version: '9.3'
@@ -162,80 +179,30 @@ locations:
162179
- sheet: description
163180
type: keyvalue
164181
translate: true
182+
atomicclass:
183+
- character
184+
- character
185+
- character
186+
- character
187+
- character
188+
- date
189+
- character
190+
- numeric
191+
- character
192+
- numeric
193+
- character
194+
- numeric
195+
- character
196+
- character
165197
varname: metadata
166198
ranges:
167-
- A10:B14
168-
- A16:B16
169-
- A18:B18
170-
- A20:B20
199+
- A10:B21
171200
- A24:B25
172-
- sheet: concentration response
173-
type: table
174-
translate: false
175-
varname: userresults
176-
atomicclass: numeric
177-
ranges:
178-
- J3:M5
179-
- sheet: BGfluo
180-
type: cells
181-
varname: userchecks
182-
translate: false
183-
atomicclass: numeric
184-
variables:
185-
- name: spread.itm1
186-
cell: G6
187-
- name: spread.itm2
188-
cell: G33
189-
translations:
190-
- long: Version
191-
short: template.version
192-
- long: Template Name
193-
short: template.name
194-
- long: Study identifier
195-
short: studyID
196-
- long: Experiment identifier
197-
short: exptID
198201
# remainder not shown
199202
```
200203

201204
A guide must contain the following elements:
202205

203-
- **template.name**: the name of the data reporting template.
204-
- **template.version**: the version of the data reporting template.
205-
- **plate.format**: the format of the microplate used in the experiment (valid values are 24, 48, 96 and 384).
206-
- **locations**: a list of locations in the spreadsheet where data are to be found. Each location is a list of elements.
207-
- **translations**: a list of translations between long and short names for variables.
208-
209-
The location data indicate where data are to be found, whereas the translation
210-
part contains translations between long and short names for variables. Short
211-
names are used as variable names in the scripts, whereas long names may be used
212-
in the spreadsheet, in particular when these are visible to the user. In that
213-
case the names should be translated before using them in the script. Reverse
214-
translations may be used by the script in the output document.
215-
216-
Required elements in a location are:
217-
218-
- **sheet**: the name of the sheet in which the data are to be found.
219-
- **type**: the format of the data in the range.
220-
- **translate**: (*true*, *false*) whether the variable names should be translated before use in the script.
221-
- **varname**: the name of the variable in which the data will be available in the script.
222-
- **ranges**: the ranges in which the data are to be found.
223-
224-
Furthermore, an optional element **atomicclass** can be provided which can have
225-
values "character", "numeric", "integer" or "logical". By default, values are
226-
converted to character, but if desired otherwise as indicated by the
227-
**atomicclass** element then values are coerces. Note that coercion is performed
228-
by the functions `as.character`, `as.numeric`, `as.integer` and `as.logical`
229-
respectively.
230-
231-
The version should correspond to that reported in the template itself,
232-
otherwise the file or the template is invalid. The user of this package
233-
should take care of this check.
234-
235-
236-
237-
## Writing a guide
238-
239206
### Required elements
240207

241208
- <kbd>guide.version</kbd>: the version of the guide
@@ -259,9 +226,34 @@ should take care of this check.
259226
to check the correctness of dimensions of the ranges of **platedata**
260227
elements.
261228

229+
The elements in **locations** indicate where data are to be found, whereas the translation
230+
part contains translations between long and short names for variables. Short
231+
names are used as variable names in the scripts, whereas long names may be used
232+
in the spreadsheet, in particular when these are visible to the user. In that
233+
case the names should be translated before using them in the script. Reverse
234+
translations may be used by the script in the output document.
235+
236+
## Locations
237+
238+
### Required elements
239+
240+
- <kbd>sheet</kbd>: the name of the sheet in which the data are to be found.
241+
- <kbd>type</kbd>: the format of the data in the range.
242+
- <kbd>translate</kbd>: (*true*, *false*) whether the variable names should be translated before use in the script.
243+
- <kbd>varname</kbd>: the name of the variable in which the data will be available in the script.
244+
- <kbd>ranges</kbd>: an array of ranges in which the data are to be found.
245+
246+
### Optional element
247+
248+
- <kbd>atomicclass</kbd>: the class of the data in the ranges, Can have values "character",
249+
"numeric", "integer", "logical" or "date"., It can.be either a singleton or an array of class of the same
250+
length as the number of ranges. If a singleton then by default all values are converted to character.
251+
If an atomicclass is given then values are coerced. Coercion is performed by the functions `as.character`, `as.numeric`,
252+
`as.integer`, `as.logical`, respectively, or in case of a date, by a function that produces a Date object.
253+
262254
### Checking against the excelDataGuide json schema
263255

264-
Correctness of the structure of a YAML file like a data guide can be
256+
Correctness of the structure and syntax of a YAML file like a data guide can be
265257
checked against a JSON schema (See [json-schema-everywhere](https://json-schema-everywhere.github.io/yaml)).
266258
We provide a JSON schema called <kbd>excelguide_schema.json</kbd> in the folder
267259
<kbd>data-raw</kbd>. We use the [Polyglottal JSON Schema Validator](https://www.npmjs.com/package/pajv)

0 commit comments

Comments
 (0)