|
| 1 | +--- |
| 2 | +layout: spec |
| 3 | +title: "[DRAFT of the next version] CODECHECK configuration file specification" |
| 4 | +version: 1.x |
| 5 | +date: 2025-xx-xx |
| 6 | +permalink: spec/config/1.x/ |
| 7 | +redirect_from: |
| 8 | + - spec/config/1.x/ |
| 9 | +--- |
| 10 | + |
| 11 | +⚠️ UNDER DEVELOPMENT, see <https://github.com/codecheckers/discussion/issues/3> for open ideas and questions ⚠️ |
| 12 | + |
| 13 | +## Introduction |
| 14 | + |
| 15 | +The CODECHECK process describes a workflow for a reproduction of computations as part of a scientific peer review. |
| 16 | +CODECHECK follows a set of [principles](/) that allow many different variations into [concrete implementations](/process). |
| 17 | +The requirements for a successful CODECHECK are intentionally kept to a minimum, as are the requirements on how codechecking is conducted, or how the procedure is documented. |
| 18 | +At the end of the CODECHECK stands a CODECHECK report document, written by the codechecker and understandable to a person with some expertise in the scientific field of the related article. |
| 19 | +Besides the human-readable information in the CODECHEK report, there is a small set of metadata elements that are part of a CODECHECK procedure which are worth capturing in a more structured format. |
| 20 | + |
| 21 | +This metadata is saved in the CODECHECK configuration file, which is specified in this document in version {{ page.version }}. |
| 22 | +The CODECHECK configuration file can serve as the identifier of a [CODECHECK bundle]({{ '/guide/bundle' | absolute_url }}), i.e. all the files part of a CODECHECK. |
| 23 | +The CODECHEK bundle is _not_ formally specified, as its contents are largely at the discretion of the codechecker. |
| 24 | +The CODECHECK configuration file, however, is formally specified to enable automated extraction and development of tools to support codechecking. |
| 25 | +Both the author and the codechecker contribute information to the configuration file. |
| 26 | + |
| 27 | +In the future, this information enables both meta-research about code within peer-reviews and more user-friendly assitance systems for authors, codecheckers, and publisher's staff. |
| 28 | + |
| 29 | +> Note |
| 30 | +> |
| 31 | +> This specification is result of a scientific collaborative [project](/project). |
| 32 | +> Help improving it by [providing your feedback](https://github.com/orgs/codecheckers/discussions). |
| 33 | +
|
| 34 | +> Notational conventions |
| 35 | +> |
| 36 | +> The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in [RFC 2119](https://tools.ietf.org/html/rfc2119). |
| 37 | +
|
| 38 | +## tl;dr for authors |
| 39 | + |
| 40 | +If you are an _author_ and just want to get the minimal `codecheck.yml` file prepared to the [community workflow]({{ 'guide/community-workflow' | absolute_url }}) [without reading the whole technical specification](https://en.wikipedia.org/wiki/Wikipedia:Too_long;_didn%27t_read), then please use the following template. |
| 41 | +The template has a bit more than the strictly mandatory fields, but these extra fields are important for you as a creator to get credit. |
| 42 | +Please validate your final configuration file with a [YAML Validator](https://codebeautify.org/yaml-validator). |
| 43 | + |
| 44 | +~~~yaml |
| 45 | +--- |
| 46 | +version: {{ page.url | absolute_url }} |
| 47 | + |
| 48 | +manifest: |
| 49 | + - file: name of output file 1 (e.g. figure.pdf) |
| 50 | + comment: short description of output file, e.g. ('Figure 1 in the paper', 'Result of running model variant A') |
| 51 | + - file: result.csv |
| 52 | + comment: short description of output file, e.g. ('Figure 1 in the paper', 'Result of running model variant A') |
| 53 | + |
| 54 | +paper: |
| 55 | + title: "A good paper" |
| 56 | + authors: |
| 57 | + - name: Josiah Carberry |
| 58 | + ORCID: 0000-0002-1825-0097 |
| 59 | + reference: https://doi.org/preprint.1 |
| 60 | +~~~ |
| 61 | + |
| 62 | +-------- |
| 63 | + |
| 64 | +## Format, name and encoding |
| 65 | + |
| 66 | +Format: [YAML 1.1](https://yaml.org/spec/1.1/current.html) _or later_ |
| 67 | + |
| 68 | +Name: **`codecheck.yml`** |
| 69 | + |
| 70 | +The file MUST be encoded in UTF-8. |
| 71 | + |
| 72 | +## Versioning |
| 73 | + |
| 74 | +This document specifies version `{{ page.version }}`. |
| 75 | +The specification uses a `major.minor` semantic versioning scheme. |
| 76 | +Non-breaking changes can be introduced in a minor version release. |
| 77 | +The latest version of the specification can be found at [{{ '/spec/config/latest' | absolute_url }}]({{ '/spec/config/latest' | absolute_url }}). |
| 78 | + |
| 79 | +## Storage location |
| 80 | + |
| 81 | +The `codecheck.yml` is stored at the root of the project folder where all files related to the CODECHECK are saved. |
| 82 | +The folder where the `codecheck.yml` file is stored is called the _CODECHECK bundle_. |
| 83 | +It is the folder that also includes a directory `codecheck` (or `.codecheck`) for all files created during codechecking, see [CODECHECK bundle documentation]({{ '/guide/bundle' | absolute_url }}). |
| 84 | + |
| 85 | +## Content |
| 86 | + |
| 87 | +### Explicit document and directive |
| 88 | + |
| 89 | +The file MUST include three dashes (`---`), the document start marker, to seperate the directive from document content. |
| 90 | + |
| 91 | +The file SHOULD define the YAML version in the directive. |
| 92 | +While YAML supports [bare](https://yaml.org/spec/1.2.2/#913-bare-documents) (YAML 1.2) or [implicit](https://yaml.org/spec/1.1/current.html#id898031) (YAML 1.1) documents, an explicit indication of the format is preferable for the CODECHECK use case. |
| 93 | +Clarity is better. |
| 94 | +A `codecheck.yml` file is therefore an _explicit document_. |
| 95 | + |
| 96 | +### Version |
| 97 | + |
| 98 | +The file SHOULD include a root-level node `version` with a URL denoting the used version of the CODECHECK configuration file specification. |
| 99 | +If no version is provided, the [latest version]({{ '/spec/config/latest' | absolute_url }}) SHOULD be assumed by software tools, but these tools CAN also abort processing the `codecheck.yml` with an informative message. |
| 100 | + |
| 101 | +> Example |
| 102 | + |
| 103 | +> ~~~yaml |
| 104 | +> %YAML 1.1 |
| 105 | +> --- |
| 106 | +> version: {{ page.url | absolute_url }} |
| 107 | +> ~~~ |
| 108 | + |
| 109 | +### Manifest list |
| 110 | + |
| 111 | +The configuration file MUST have a root-level sequence (i.e., a list) of files called `manifest` that form the _manifest_. |
| 112 | +All files part of the manifest must be recreated during a CODECHECK. |
| 113 | + |
| 114 | +Each manifest sequence item MUST have a node `file` providing the relative path to a file that is part of the computational workflow. |
| 115 | +The relative paths MUST be relative to the location of the `codecheck.yml`. |
| 116 | +Each manifest sequence item MAY have a node `comment` with human-readable information about said file. |
| 117 | + |
| 118 | +> Example |
| 119 | +> |
| 120 | +> ~~~yaml |
| 121 | +> --- |
| 122 | +> version: {{ page.url | absolute_url }} |
| 123 | +> |
| 124 | +> manifest: |
| 125 | +> - file: outputData.csv |
| 126 | +> comment: data/output/one.csv |
| 127 | +> - file: fig1.pdf |
| 128 | +> - file: resultVectors.txt |
| 129 | +> - file: appendix_figures.pdf |
| 130 | +> comment: "appendix of paper, starting at page 12" |
| 131 | +> ~~~ |
| 132 | + |
| 133 | +### Author and submission metadata |
| 134 | + |
| 135 | +The configuration file SHOULD include minimal metadata about the paper, i.e. the title and the author(s) of the paper whose workflow is submitted to the CODECHECK. |
| 136 | +For this information, the configuration file SHOULD have a root-level sequence `paper`. |
| 137 | +This information might be added after the CODECHECK or edited, e.g., after publication, therefore all sub-elements are optional. |
| 138 | + |
| 139 | +The element `paper` SHOULD have a child item `title` with the title of the submission or publication. |
| 140 | + |
| 141 | +The element `paper` SHOULD have a child sequence `authors`. |
| 142 | +The child nodes of `authors` sequence are called "author item". |
| 143 | +There MUST be at least one author item, which is the corresponding author of the workflow under review. |
| 144 | +The corresponding author MUST be the first author item in the `authors` sequence. |
| 145 | +However, "authors" may be used very broadly and should not only list all authors but can include all types of contributors, e.g., software engineers, infrastructure service staff, etc. |
| 146 | + |
| 147 | +Each author item MUST have a child `name` with the author's name. |
| 148 | +Each author item MUST have a child `ORCID` with the author's [ORCID identifier](https://support.orcid.org/hc/en-us/articles/360006897674). |
| 149 | +The value of the MUST be the plain ORCID, e.g., `0000-0000-0000-0000`, without URL prefix (i.e., without `https://orcid.org/...`). |
| 150 | + |
| 151 | +If the workflow accompanies a preprinted article or concerns an article under review, a reference to the article SHOULD be put in the node `reference` under the root-level node `paper`. |
| 152 | +Ideally the identifier is a [DOI](https://en.wikipedia.org/wiki/Digital_object_identifier) in form of a resolvable URL, or a identifiable text string such as `arXiv:2001.10641`, or a short text "Under review at X"/"Paper to appear in Y". |
| 153 | + |
| 154 | +> Example |
| 155 | +> |
| 156 | +> ~~~yaml |
| 157 | +> --- |
| 158 | +> # [...] |
| 159 | +> |
| 160 | +> paper: |
| 161 | +> title: "A good paper" |
| 162 | +> authors: |
| 163 | +> - name: Josiah Carberry |
| 164 | +> ORCID: 0000-0002-1825-0097 |
| 165 | +> - name: John Doe |
| 166 | +> reference: https://doi.org/preprint.1 |
| 167 | +> ~~~ |
| 168 | + |
| 169 | +<a name='source'> </a> |
| 170 | + |
| 171 | +The configuration file CAN have a root level node `source` with a textual description or a single URL to describe the source of the checked material. |
| 172 | +The field SHOULD be used if the material used for the check is drawn from multiple sources so that the `repository` node (see [Codecheck metadata](#codecheck-metadata)) and the metadata accessible via that URL can not sufficiently describe provenance of code or data files. |
| 173 | + |
| 174 | +> Example |
| 175 | +> |
| 176 | +> ~~~yaml |
| 177 | +> --- |
| 178 | +> # [...] |
| 179 | +> |
| 180 | +> source: Data is available at https://download.url/dataset/123456/v2 and code can be found in an attachment to the submitted manuscript. |
| 181 | +> ~~~ |
| 182 | + |
| 183 | +### Codecheck metadata |
| 184 | + |
| 185 | +Further important metadata is created during the CODECHECK process. |
| 186 | +The `codecheck.yml` started by the author is extended with this information by the codechecker. |
| 187 | +If a codechecker changes the meaning of any content provided by the author in the configuration file, they SHOULD clearly mark these changes in the form of a comment, in addition to a transparent record through the file being under version control. |
| 188 | + |
| 189 | +The configuration file MUST include minimal metadata about the codechecker in a root-level sequence `codechecker` with at least one child element. |
| 190 | +Each item in the `codechecker` sequence MUST have one node `name` with the codechecker's name. |
| 191 | +Each item in the `codechecker` sequence SHOULD have a child `ORCID` as defined in [Author and submission metadata](#author-and-submission-metadata). |
| 192 | + |
| 193 | +The configuration file MUST have a root-level node `report` with a unique identifier for the published CODECHECK report, such as a URL or DOI, ideally in a resolvable format. |
| 194 | + |
| 195 | +The CODECHECK CAN add further fields with the following names and semantics: |
| 196 | + |
| 197 | +- `summary`: Short textual summary of the CODECHECK report. |
| 198 | +- `repository`: A URL or a list of URLs to the code or data repository/ies where more files and a version history of the checked workflow are available. |
| 199 | +- `source`: see [`source`](#source). |
| 200 | +- `check_time`: A date or timestamp when the CODECHECK was completed. If not time is provided, it should be assumed that codechecking was completed at the publication date of the CODECHEK report. |
| 201 | +- `certificate`: A unique identifier for the certificate as awared in the [CODECHECK register](https://github.com/codecheckers/register/). |
| 202 | + |
| 203 | +> Example |
| 204 | +> |
| 205 | +> ~~~yaml |
| 206 | +> --- |
| 207 | +> manifest: |
| 208 | +> - file: outputData.csv |
| 209 | +> comment: data/output/one.csv |
| 210 | +> - file: fig1.pdf |
| 211 | +> |
| 212 | +> codechecker: |
| 213 | +> - name: S. Eglen |
| 214 | +> ORCID: 0000-0001-8607-8025 |
| 215 | +> - name: Daniel N. |
| 216 | +> ORCID: 0000-0002-0024-5046 |
| 217 | +> |
| 218 | +> report: https://doi.org/10.5281/zenodo.3674056 |
| 219 | +> summary: | |
| 220 | +> The check was straightforward as all material was provided and documented well, but computations took about 3 hours to run. |
| 221 | +> repository: https://github.com/codecheckers/Piccolo-2020 |
| 222 | +> check_time: "2019-01-01 13:00:00" |
| 223 | +> certificate: 2020-001 |
| 224 | +> ~~~ |
| 225 | + |
| 226 | +### Additional content |
| 227 | + |
| 228 | +The file `codecheck.yml` may include any number of other nodes or sequences to support specific instances of a CODECHECK process. |
| 229 | +For clarity these SHOULD be named in a way that clearly identifies the origin and use case, e.g. by prepending a common prefix to node names or using a single parent node. |
| 230 | + |
| 231 | +> Example |
| 232 | +> |
| 233 | +> ~~~yaml |
| 234 | +> --- |
| 235 | +> # [...] |
| 236 | +> |
| 237 | +> publishing_inc_identifier: 12345 |
| 238 | +> publishing_inc_handler: Ed Editor |
| 239 | +> |
| 240 | +> TheBestRepository: |
| 241 | +> recordId: 1a2b3c |
| 242 | +> checksum: cdce90c878462d073b31aec21ccee48e3366250a6baafd215fa73d1c6bc0357b |
| 243 | +> ~~~ |
| 244 | + |
| 245 | +## Minimal example |
| 246 | + |
| 247 | +~~~yaml |
| 248 | +--- |
| 249 | +manifest: |
| 250 | + - file: fig1.pdf |
| 251 | + |
| 252 | +FIXME! |
| 253 | +~~~ |
| 254 | + |
| 255 | +## Full example |
| 256 | + |
| 257 | +~~~yaml |
| 258 | +%YAML 1.1 |
| 259 | +--- |
| 260 | +version: {{ page.url | absolute_url }} |
| 261 | + |
| 262 | +manifest: |
| 263 | + - file: outputData.csv |
| 264 | + comment: originally stored at data/output/one.csv |
| 265 | + - file: fig1.pdf |
| 266 | + comment: Figure 1 |
| 267 | + - file: resultVectors.txt |
| 268 | + comment: output vectors in plain text format |
| 269 | + - file: appendix_figures.pdf |
| 270 | + comment: "appendix of paper, starting at page 12" |
| 271 | + |
| 272 | +paper: |
| 273 | + title: "A good paper" |
| 274 | + authors: |
| 275 | + - name: Josiah Carberry |
| 276 | + ORCID: 0000-0002-1825-0097 |
| 277 | + - name: John Doe |
| 278 | + reference: https://doi.org/preprint.1 |
| 279 | + |
| 280 | +codechecker: |
| 281 | + - name: S. Eglen |
| 282 | + ORCID: 0000-0001-8607-8025 |
| 283 | + |
| 284 | +report: https://doi.org/abcde.12345 |
| 285 | + |
| 286 | +summary: | |
| 287 | + The check was straightforward as all material was provided anddocumented well, but computations took about 3 hours to run. |
| 288 | + |
| 289 | + The created figures seem to match the ones provided in the article. The content of other output files was not checked. |
| 290 | +repository: |
| 291 | + - https://github.com/codecheckers/example-workflow |
| 292 | + - https://github.com/codecheckers/example-data |
| 293 | +check_time: "2019-01-01 13:00:00" |
| 294 | +certificate: 2020-999 |
| 295 | +~~~ |
| 296 | + |
| 297 | +More examples can be found in the repositories of the codecheckers organisation on GitHub: [https://github.com/codecheckers/](https://github.com/codecheckers/). |
0 commit comments