Adding several entries in FAQ (#18)

jchiquet · mathurinm · gdurif · web-flow · commit f716ff39ec78 · 2023-06-22T17:20:21.000+02:00
* advances in faq * archiving post describing obsolete submission process * updating post for gh-page activation * post for other languages * reproductibility + lon-running code * reproductibility + lon-running code * split questions in FAQ, partially fixing #1 * Update _posts/2023-03-24-others-languages.md * Update _posts/2023-03-24-what-reproducibility.md * Update _posts/2023-06-21-data.md * add comments about other data repositories --------- Co-authored-by: mathurinm <mathurinm@users.noreply.github.com> Co-authored-by: gdurif <gd.dev@libertymail.net>
diff --git a/_config.yml b/_config.yml
@@ -238,8 +238,7 @@ jekyll-archives:
     tag: '/blog/tag/:name/'
     category: '/blog/category/:name/'
 
-display_tags: ['formatting', 'reproducibility', 'data', 'code'] # this tags will be dispalyed on the front page of your blog
-
+display_tags: ['formatting', 'reproducibility', 'data', 'code'] # this tags will be displayed on the front page of your blog
 # -----------------------------------------------------------------------------
 # Jekyll Scholar
 # -----------------------------------------------------------------------------
diff --git a/_pages/submit.md b/_pages/submit.md
@@ -70,10 +70,6 @@ if you are attached to Jupyter book or do not prefer to use Quarto, you are of c
 
 </div>
 
-## Data and large files
-
-If your submission materials contain files larger than 50MB, **especially data files**, they won’t fit on a git repository as is. For this reason, we encourage you to put your data or any materials you deem necessary on an external “open data” centered repository hub such a [Zenodo](https://zenodo.org/) or [OSF](https://osf.io/).
-
 ## Submit your work
 
 Once your are happy with your notebook AND the continuous integration (Github action or Gitlab CI) is successful, you must submit your PDF with [OpenReview, our platform for peer-reviewing](https://openreview.net/group?id=Computo).
diff --git a/_posts/2021-04-23-submission-process.md b/_posts/2021-04-23-submission-process.md
@@ -1,7 +1,7 @@
 ---
 layout: post
 title:  How does Computo work?
-date: 2021-04-23 00:00:00
+date: 3021-04-23 00:00:00
 description: Diagrams that describe the submission process
 ---
 
diff --git a/_posts/2023-03-17-HTML-to-website.md b/_posts/2023-03-17-HTML-to-website.md
@@ -15,8 +15,6 @@ We review here the full process for more clarity.
 
 If you used one of our template repository, the build action (in `.github/workflows/build.yml`) should look like this:
 
-
-
 {% highlight yaml linenos %}
 name: build
 
diff --git a/_posts/2023-03-24-gitlab-integration.md b/_posts/2023-03-24-gitlab-integration.md
@@ -0,0 +1,10 @@
+---
+layout: post
+title: 'I use gitlab instead of github: what should I do?'
+date: 2030-03-24 00:00:00
+tags: reproducibility
+description: Discuss integration of Computo's contribution in Gitlab instances
+---
+
+_Under Construction_
+
diff --git a/_posts/2023-03-24-others-languages.md b/_posts/2023-03-24-others-languages.md
@@ -0,0 +1,15 @@
+---
+layout: post
+title: 'I use a different language than Python, R or Julia: would Computo accept my contributions?'
+date: 2023-03-24 00:00:00
+tags: [reproducibility, code]
+description: Describe how to handle other languages than R, Julia or Python
+---
+
+In principle, we are open to any kind of language.
+
+In practice, we need to integrate reproducible and compilable code into our quarto template. Natively, we support, `R`, `Python` and `Julia` and provide dedicated templates. For others, if the language is supported by a Jupyter kernel ([there are kernels for many languages](https://gist.github.com/chronitis/682c4e0d9f663e85e3d87e97cd7d1624), [quarto allows code execution](https://quarto.org/docs/computations/execution-options.html#engine-binding).
+
+When writing your contribution though, keep in mind that some languages are not designed for interactivity and that there will be a formatting effort to support your point in your manuscript (which could be as expensive as interfacing this code with Python or R via `pybind11`, `Rcpp` or equivalent). It's your choice.
+
+From our side, we will do our best for the technical aspects to help with the integration of any language, but the editorial board and reviewers will also do the work to make sure the contribution is within the bounds scientifically and in the spirit of reproducibility.
diff --git a/_posts/2023-03-24-what-reproducibility.md b/_posts/2023-03-24-what-reproducibility.md
@@ -0,0 +1,11 @@
+---
+layout: post
+title: What is expected exactly in terms of reproducibility?
+date: 2023-04-24 00:00:00
+tags: reproducibility
+description: Discuss the different kinds of reproducibility at play in Computo, and what is expected from the authors.
+---
+
+Computo is not just about publishing a notebook and proving that it can be compiled with CI! This part of the process is what we call _"Editorial Reproducibility"_. _"Scientific"_ or _"numerical"_ reproducibility of the analyses is also mandatory, on top of classical peer-review evaluation. 
+
+We don't ask people reproducing their data... yet! We also don't ask for "bit-wise computational" reproducibility (i.e. obtaining exactly the same results bit-by-bit) but rather a "statistical" reproducibility, i.e. obtaining results leading to the same conclusion, with potential non-significant statistical variability.
diff --git a/_posts/2023-06-21-data.md b/_posts/2023-06-21-data.md
@@ -0,0 +1,17 @@
+---
+layout: post
+title: I have large or sensible data. How should I proceed?
+date: 2023-06-21 00:00:00
+tags: reproducibility
+description: Describe how to handle large or sensible data files when submitting to Computo
+---
+
+## Large data sets
+
+If your submission materials contain files larger than 50MB, **especially data files**, they won’t fit on a git repository as is. For this reason, we encourage you to put your data or any materials you deem necessary on an external “open data” centered repository hub such a [Zenodo](https://zenodo.org/) or [OSF](https://osf.io/).
+
+You could also use any long-term (emphasis on long-term) data repository that is standard in your scientific community (or for a specific type of data/scientific application), and for which it is straight-forward to retrieve the data using a script code/notebook code (we highly encourage to use open platforms, ideally institutionally hosted).
+
+## Sensible data sets
+
+Since the reproducibility of numerical results is a necessary condition for publication in *Computo*, your submissions must include all necessary data (e.g. via Zenodo repositories). However, if you have sensible data (for example, biomedical data that needs to be anonymized), you are invited to contact the editorial committee to explain and justify your position. In any case, we will ask you to make public at least a sample of the original data, and a demonstration of its use in your article for Computo. The results of the analyses carried out on the totality of the data should be made available in the form of a binary file, in order to produce the statistical summaries necessary to illustrate your assertions.
diff --git a/_posts/2023-06-21-long-running-code.md b/_posts/2023-06-21-long-running-code.md
@@ -0,0 +1,9 @@
+---
+layout: post
+title: My data analysis takes several hours/days/weeks... How to address the issue of reproducibility?
+date: 2023-06-21 00:00:00
+tags: reproducibility
+description: Discuss the reproducibility for long-running code
+---
+
+If your analyses, model tuning or training phase take a prohibitively long time to compile and integrate, you can include the results of the trained methods in the form of a binary file. However, you must provide the code enabling the user to fully reproduce the training phase, and illustrate your code in a small, toy-sized example.