Skip to content

Commit 13fe574

Browse files
committed
add draft for 1.x yml spec
1 parent 180e93b commit 13fe574

File tree

1 file changed

+297
-0
lines changed

1 file changed

+297
-0
lines changed

spec/config/1.x.md

Lines changed: 297 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,297 @@
1+
---
2+
layout: spec
3+
title: "[DRAFT of the next version] CODECHECK configuration file specification"
4+
version: 1.x
5+
date: 2025-02-20
6+
permalink: spec/config/1.x/
7+
#redirect_from:
8+
# - spec/config/1.x/
9+
---
10+
11+
⚠️ UNDER DEVELOPMENT, see <https://github.com/codecheckers/discussion/issues/3> for open ideas and questions ⚠️
12+
13+
## Introduction
14+
15+
The CODECHECK process describes a workflow for a reproduction of computations as part of a scientific peer review.
16+
CODECHECK follows a set of [principles](/) that allow many different variations into [concrete implementations](/process).
17+
The requirements for a successful CODECHECK are intentionally kept to a minimum, as are the requirements on how codechecking is conducted, or how the procedure is documented.
18+
At the end of the CODECHECK stands a CODECHECK report document, written by the codechecker and understandable to a person with some expertise in the scientific field of the related article.
19+
Besides the human-readable information in the CODECHEK report, there is a small set of metadata elements that are part of a CODECHECK procedure which are worth capturing in a more structured format.
20+
21+
This metadata is saved in the CODECHECK configuration file, which is specified in this document in version {{ page.version }}.
22+
The CODECHECK configuration file can serve as the identifier of a [CODECHECK bundle]({{ '/guide/bundle' | absolute_url }}), i.e. all the files part of a CODECHECK.
23+
The CODECHEK bundle is _not_ formally specified, as its contents are largely at the discretion of the codechecker.
24+
The CODECHECK configuration file, however, is formally specified to enable automated extraction and development of tools to support codechecking.
25+
Both the author and the codechecker contribute information to the configuration file.
26+
27+
In the future, this information enables both meta-research about code within peer-reviews and more user-friendly assitance systems for authors, codecheckers, and publisher's staff.
28+
29+
> Note
30+
>
31+
> This specification is result of a scientific collaborative [project](/project).
32+
> Help improving it by [providing your feedback](https://github.com/orgs/codecheckers/discussions).
33+
34+
> Notational conventions
35+
>
36+
> The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in [RFC 2119](https://tools.ietf.org/html/rfc2119).
37+
38+
## tl;dr for authors
39+
40+
If you are an _author_ and just want to get the minimal `codecheck.yml` file prepared to the [community workflow]({{ 'guide/community-workflow' | absolute_url }}) [without reading the whole technical specification](https://en.wikipedia.org/wiki/Wikipedia:Too_long;_didn%27t_read), then please use the following template.
41+
The template has a bit more than the strictly mandatory fields, but these extra fields are important for you as a creator to get credit.
42+
Please validate your final configuration file with a [YAML Validator](https://codebeautify.org/yaml-validator).
43+
44+
~~~yaml
45+
---
46+
version: {{ page.url | absolute_url }}
47+
48+
manifest:
49+
- file: name of output file 1 (e.g. figure.pdf)
50+
comment: short description of output file, e.g. ('Figure 1 in the paper', 'Result of running model variant A')
51+
- file: result.csv
52+
comment: short description of output file, e.g. ('Figure 1 in the paper', 'Result of running model variant A')
53+
54+
paper:
55+
title: "A good paper"
56+
authors:
57+
- name: Josiah Carberry
58+
ORCID: 0000-0002-1825-0097
59+
reference: https://doi.org/preprint.1
60+
~~~
61+
62+
--------
63+
64+
## Format, name and encoding
65+
66+
Format: [YAML 1.1](https://yaml.org/spec/1.1/current.html) _or later_
67+
68+
Name: **`codecheck.yml`**
69+
70+
The file MUST be encoded in UTF-8.
71+
72+
## Versioning
73+
74+
This document specifies version `{{ page.version }}`.
75+
The specification uses a `major.minor` semantic versioning scheme.
76+
Non-breaking changes can be introduced in a minor version release.
77+
The latest version of the specification can be found at [{{ '/spec/config/latest' | absolute_url }}]({{ '/spec/config/latest' | absolute_url }}).
78+
79+
## Storage location
80+
81+
The `codecheck.yml` is stored at the root of the project folder where all files related to the CODECHECK are saved.
82+
The folder where the `codecheck.yml` file is stored is called the _CODECHECK bundle_.
83+
It is the folder that also includes a directory `codecheck` (or `.codecheck`) for all files created during codechecking, see [CODECHECK bundle documentation]({{ '/guide/bundle' | absolute_url }}).
84+
85+
## Content
86+
87+
### Explicit document and directive
88+
89+
The file MUST include three dashes (`---`), the document start marker, to seperate the directive from document content.
90+
91+
The file SHOULD define the YAML version in the directive.
92+
While YAML supports [bare](https://yaml.org/spec/1.2.2/#913-bare-documents) (YAML 1.2) or [implicit](https://yaml.org/spec/1.1/current.html#id898031) (YAML 1.1) documents, an explicit indication of the format is preferable for the CODECHECK use case.
93+
Clarity is better.
94+
A `codecheck.yml` file is therefore an _explicit document_.
95+
96+
### Version
97+
98+
The file SHOULD include a root-level node `version` with a URL denoting the used version of the CODECHECK configuration file specification.
99+
If no version is provided, the [latest version]({{ '/spec/config/latest' | absolute_url }}) SHOULD be assumed by software tools, but these tools CAN also abort processing the `codecheck.yml` with an informative message.
100+
101+
> Example
102+
103+
> ~~~yaml
104+
> %YAML 1.1
105+
> ---
106+
> version: {{ page.url | absolute_url }}
107+
> ~~~
108+
109+
### Manifest list
110+
111+
The configuration file MUST have a root-level sequence (i.e., a list) of files called `manifest` that form the _manifest_.
112+
All files part of the manifest must be recreated during a CODECHECK.
113+
114+
Each manifest sequence item MUST have a node `file` providing the relative path to a file that is part of the computational workflow.
115+
The relative paths MUST be relative to the location of the `codecheck.yml`.
116+
Each manifest sequence item MAY have a node `comment` with human-readable information about said file.
117+
118+
> Example
119+
>
120+
> ~~~yaml
121+
> ---
122+
> version: {{ page.url | absolute_url }}
123+
>
124+
> manifest:
125+
> - file: outputData.csv
126+
> comment: data/output/one.csv
127+
> - file: fig1.pdf
128+
> - file: resultVectors.txt
129+
> - file: appendix_figures.pdf
130+
> comment: "appendix of paper, starting at page 12"
131+
> ~~~
132+
133+
### Author and submission metadata
134+
135+
The configuration file SHOULD include minimal metadata about the paper, i.e. the title and the author(s) of the paper whose workflow is submitted to the CODECHECK.
136+
For this information, the configuration file SHOULD have a root-level sequence `paper`.
137+
This information might be added after the CODECHECK or edited, e.g., after publication, therefore all sub-elements are optional.
138+
139+
The element `paper` SHOULD have a child item `title` with the title of the submission or publication.
140+
141+
The element `paper` SHOULD have a child sequence `authors`.
142+
The child nodes of `authors` sequence are called "author item".
143+
There MUST be at least one author item, which is the corresponding author of the workflow under review.
144+
The corresponding author MUST be the first author item in the `authors` sequence.
145+
However, "authors" may be used very broadly and should not only list all authors but can include all types of contributors, e.g., software engineers, infrastructure service staff, etc.
146+
147+
Each author item MUST have a child `name` with the author's name.
148+
Each author item MUST have a child `ORCID` with the author's [ORCID identifier](https://support.orcid.org/hc/en-us/articles/360006897674).
149+
The value of the MUST be the plain ORCID, e.g., `0000-0000-0000-0000`, without URL prefix (i.e., without `https://orcid.org/...`).
150+
151+
If the workflow accompanies a preprinted article or concerns an article under review, a reference to the article SHOULD be put in the node `reference` under the root-level node `paper`.
152+
Ideally the identifier is a [DOI](https://en.wikipedia.org/wiki/Digital_object_identifier) in form of a resolvable URL, or a identifiable text string such as `arXiv:2001.10641`, or a short text "Under review at X"/"Paper to appear in Y".
153+
154+
> Example
155+
>
156+
> ~~~yaml
157+
> ---
158+
> # [...]
159+
>
160+
> paper:
161+
> title: "A good paper"
162+
> authors:
163+
> - name: Josiah Carberry
164+
> ORCID: 0000-0002-1825-0097
165+
> - name: John Doe
166+
> reference: https://doi.org/preprint.1
167+
> ~~~
168+
169+
<a name='source'>&nbsp;</a>
170+
171+
The configuration file CAN have a root level node `source` with a textual description or a single URL to describe the source of the checked material.
172+
The field SHOULD be used if the material used for the check is drawn from multiple sources so that the `repository` node (see [Codecheck metadata](#codecheck-metadata)) and the metadata accessible via that URL can not sufficiently describe provenance of code or data files.
173+
174+
> Example
175+
>
176+
> ~~~yaml
177+
> ---
178+
> # [...]
179+
>
180+
> source: Data is available at https://download.url/dataset/123456/v2 and code can be found in an attachment to the submitted manuscript.
181+
> ~~~
182+
183+
### Codecheck metadata
184+
185+
Further important metadata is created during the CODECHECK process.
186+
The `codecheck.yml` started by the author is extended with this information by the codechecker.
187+
If a codechecker changes the meaning of any content provided by the author in the configuration file, they SHOULD clearly mark these changes in the form of a comment, in addition to a transparent record through the file being under version control.
188+
189+
The configuration file MUST include minimal metadata about the codechecker in a root-level sequence `codechecker` with at least one child element.
190+
Each item in the `codechecker` sequence MUST have one node `name` with the codechecker's name.
191+
Each item in the `codechecker` sequence SHOULD have a child `ORCID` as defined in [Author and submission metadata](#author-and-submission-metadata).
192+
193+
The configuration file MUST have a root-level node `report` with a unique identifier for the published CODECHECK report, such as a URL or DOI, ideally in a resolvable format.
194+
195+
The CODECHECK CAN add further fields with the following names and semantics:
196+
197+
- `summary`: Short textual summary of the CODECHECK report.
198+
- `repository`: A URL or a list of URLs to the code or data repository/ies where more files and a version history of the checked workflow are available.
199+
- `source`: see [`source`](#source).
200+
- `check_time`: A date or timestamp when the CODECHECK was completed. If not time is provided, it should be assumed that codechecking was completed at the publication date of the CODECHEK report.
201+
- `certificate`: A unique identifier for the certificate as awared in the [CODECHECK register](https://github.com/codecheckers/register/).
202+
203+
> Example
204+
>
205+
> ~~~yaml
206+
> ---
207+
> manifest:
208+
> - file: outputData.csv
209+
> comment: data/output/one.csv
210+
> - file: fig1.pdf
211+
>
212+
> codechecker:
213+
> - name: S. Eglen
214+
> ORCID: 0000-0001-8607-8025
215+
> - name: Daniel N.
216+
> ORCID: 0000-0002-0024-5046
217+
>
218+
> report: https://doi.org/10.5281/zenodo.3674056
219+
> summary: |
220+
> The check was straightforward as all material was provided and documented well, but computations took about 3 hours to run.
221+
> repository: https://github.com/codecheckers/Piccolo-2020
222+
> check_time: "2019-01-01 13:00:00"
223+
> certificate: 2020-001
224+
> ~~~
225+
226+
### Additional content
227+
228+
The file `codecheck.yml` may include any number of other nodes or sequences to support specific instances of a CODECHECK process.
229+
For clarity these SHOULD be named in a way that clearly identifies the origin and use case, e.g. by prepending a common prefix to node names or using a single parent node.
230+
231+
> Example
232+
>
233+
> ~~~yaml
234+
> ---
235+
> # [...]
236+
>
237+
> publishing_inc_identifier: 12345
238+
> publishing_inc_handler: Ed Editor
239+
>
240+
> TheBestRepository:
241+
> recordId: 1a2b3c
242+
> checksum: cdce90c878462d073b31aec21ccee48e3366250a6baafd215fa73d1c6bc0357b
243+
> ~~~
244+
245+
## Minimal example
246+
247+
~~~yaml
248+
---
249+
manifest:
250+
- file: fig1.pdf
251+
252+
FIXME!
253+
~~~
254+
255+
## Full example
256+
257+
~~~yaml
258+
%YAML 1.1
259+
---
260+
version: {{ page.url | absolute_url }}
261+
262+
manifest:
263+
- file: outputData.csv
264+
comment: originally stored at data/output/one.csv
265+
- file: fig1.pdf
266+
comment: Figure 1
267+
- file: resultVectors.txt
268+
comment: output vectors in plain text format
269+
- file: appendix_figures.pdf
270+
comment: "appendix of paper, starting at page 12"
271+
272+
paper:
273+
title: "A good paper"
274+
authors:
275+
- name: Josiah Carberry
276+
ORCID: 0000-0002-1825-0097
277+
- name: John Doe
278+
reference: https://doi.org/preprint.1
279+
280+
codechecker:
281+
- name: S. Eglen
282+
ORCID: 0000-0001-8607-8025
283+
284+
report: https://doi.org/abcde.12345
285+
286+
summary: |
287+
The check was straightforward as all material was provided anddocumented well, but computations took about 3 hours to run.
288+
289+
The created figures seem to match the ones provided in the article. The content of other output files was not checked.
290+
repository:
291+
- https://github.com/codecheckers/example-workflow
292+
- https://github.com/codecheckers/example-data
293+
check_time: "2019-01-01 13:00:00"
294+
certificate: 2020-999
295+
~~~
296+
297+
More examples can be found in the repositories of the codecheckers organisation on GitHub: [https://github.com/codecheckers/](https://github.com/codecheckers/).

0 commit comments

Comments
 (0)