Skip to content

Commit 68f56e9

Browse files
Merge pull request #43 from StanfordHPDS/bump_workflows
Proposal: bump pipelines to required workflows
2 parents 00644a4 + 4c772a4 commit 68f56e9

File tree

3 files changed

+23
-31
lines changed

3 files changed

+23
-31
lines changed

_quarto.yml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,9 @@ book:
1515
- chapters/08-code-review.qmd
1616
- chapters/09-code-workflow-agreements.qmd
1717
- chapters/10-pre-flight-checklist.qmd
18-
- chapters/99-references.qmd
1918

2019
bibliography: references.bib
2120

2221
format:
2322
html:
2423
theme: cosmo
25-
26-
27-

chapters/09-code-workflow-agreements.qmd

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -361,9 +361,31 @@ Note that this applies only to Jupyter Notebooks. While Quarto uses the Jupyter
361361
Rendering a Quarto document always runs code from scratch by default.
362362
:::
363363

364+
### Pipelines {#sec-pipelines}
365+
366+
Pipeline tools are software that manage the execution of code. What's practical about this for research projects is that pipeline tools track the relationship between components in your project (meaning it knows which order to run things in automatically) and will only run those components when they are out of date (meaning you don't necessarily need to rerun your entire project because you updated one part of the code). They are also very handy for reproducing code, because they only require a command or two to run the entire pipeline.
367+
368+
Pipeline tools are helpful for projects of any size, but they are particularly suited to complex or computationally intense projects.
369+
370+
::: panel-tabset
371+
## R
372+
373+
The best pipeline tool in R is the targets package. targets is a native R tool, making it easy to work with R objects. It works particularly well with Quarto and R Markdown, allowing you to reduce the amount of code in a report while managing it reproducibly.
374+
375+
targets has [excellent documentation and tutorials](https://books.ropensci.org/targets/), so we point you there for guidance.
376+
377+
It's also possible to use tools like Make (see the Python tab) among others, with R, although we recommend targets for projects that are mostly R. For projects that are a mix of languages, Make may be a better fit.
378+
379+
## Python
380+
381+
Python has several pipeline tools that are used in data engineering. For these larger data projects, these tools are sometimes called *orchestration* tools. That said, many of them are much more complex than is needed for a single research project.
382+
383+
For research projects, we recommend [GNU Make](https://www.gnu.org/software/make/). Make is one of the oldest and most popular pipeline tools--over 40 years old. It shows its age in some ways, but it's also battle-tested. See [this tutorial](https://third-bit.com/py-rse/automate.html) for an example of running an analysis with Make.
384+
:::
385+
364386
### Provide Guidance on How to Run Your Code
365387

366-
Your `README` should include guidance on how to run your code. For instance, if there is a command to run the entire project, include information about that process (this is usually related to pipeline-managed code as discussed in the optional @sec-pipelines). If you intend the user to run scripts in a particular order, describe how.
388+
Your `README` should include guidance on how to run your code. For instance, if there is a command to run the entire project, include information about that process (this is usually related to pipeline-managed code as discussed in @sec-pipelines). If you intend the user to run scripts in a particular order, describe how, but prefer using a pipeline tool to manage this instead.
367389

368390
## Lock your Package Versions {#sec-pkg-env}
369391

@@ -473,28 +495,6 @@ See the [documentation](https://docs.astral.sh/uv/getting-started/features/) for
473495

474496
Opt-in workflows are things we do not require for a project but for which we offer guidance. Such workflows also allow the team to experiment with new things and see what works for projects and when.
475497

476-
### Pipelines {#sec-pipelines}
477-
478-
Pipeline tools are software that manage the execution of code. What's practical about this for research projects is that pipeline tools track the relationship between components in your project (meaning it knows which order to run things in automatically) and will only run those components when they are out of date (meaning you don't necessarily need to rerun your entire project because you updated one part of the code). They are also very handy for reproducing code, because they only require a command or two to run the entire pipeline.
479-
480-
Pipeline tools are helpful for projects of any size, but they are particularly suited to complex or computationally intense projects.
481-
482-
::: panel-tabset
483-
## R
484-
485-
The best pipeline tool in R is the targets package. targets is a native R tool, making it easy to work with R objects. It works particularly well with Quarto and R Markdown, allowing you to reduce the amount of code in a report while managing it reproducibly.
486-
487-
targets has [excellent documentation and tutorials](https://books.ropensci.org/targets/), so we point you there for guidance.
488-
489-
It's also possible to use tools like Make (see the Python tab) among others , with R, although we recommend targets for projects that are mostly R. For projects that are a mix of languages, Make may be a better fit.
490-
491-
## Python
492-
493-
Python has several pipeline tools that are used in data engineering. For these larger data projects, these tools are sometimes called *orchestration* tools. That said, many of them are much more complex than is needed for a single research project.
494-
495-
For research projects, we recommend [GNU Make](https://www.gnu.org/software/make/). Make is one of the oldest and most popular pipeline tools--over 40 years old. It shows its age in some ways, but it's also battle-tested. See [this tutorial](https://third-bit.com/py-rse/automate.html) for an example of running an analysis with Make.
496-
:::
497-
498498
### Testing {#sec-tests}
499499

500500
In scientific work, two types of code tests are useful: code expectations and data expectations. Code should *behave* the way you expect, and data should *exist* the way you expect. If that is not the case, you either have identified a problem with your code and data or a problem with your expectations.

chapters/99-references.qmd

Lines changed: 0 additions & 4 deletions
This file was deleted.

0 commit comments

Comments
 (0)