Skip to content

Commit 1597bb5

Browse files
Merge pull request #63 from paxtonfitzpatrick/main
edits to statement of need, DOIs for citations where available
2 parents 19a121b + fedaa0c commit 1597bb5

File tree

2 files changed

+103
-70
lines changed

2 files changed

+103
-70
lines changed

paper/paper.bib

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ @article{GranGrou16
77

88
@article{HarrEtal20,
99
author = {Charles R. Harris and K. Jarrod Millman and St{\'{e}}fan J. van der Walt and Ralf Gommers and Pauli Virtanen and David Cournapeau and Eric Wieser and Julian Taylor and Sebastian Berg and Nathaniel J. Smith and Robert Kern and Matti Picus and Stephan Hoyer and Marten H. van Kerkwijk and Matthew Brett and Allan Haldane and Jaime Fern{\'{a}}ndez del R{\'{i}}o and Mark Wiebe and Pearu Peterson and Pierre G{\'{e}}rard-Marchant and Kevin Sheppard and Tyler Reddy and Warren Weckesser and Hameer Abbasi and Christoph Gohlke and Travis E. Oliphant},
10+
doi = {10.1038/s41586-020-2649-2},
1011
journal = {Nature},
1112
number = {7825},
1213
pages = {357--362},
@@ -16,6 +17,7 @@ @article{HarrEtal20
1617

1718
@article{Hunt07,
1819
author = {J D Hunter},
20+
doi = {10.1109/MCSE.2007.55},
1921
journal = {Computing in Science and Engineering},
2022
number = {3},
2123
pages = {90--95},
@@ -27,6 +29,7 @@ @inproceedings{KluyEtal16
2729
address = {Netherlands},
2830
author = {Thomas Kluyver and Benjamin Ragan-Kelley and Fernando P{\'e}rez and Brian Granger and Matthias Bussonnier and Jonathan Frederic and Kyle Kelley and Jessica Hamrick and Jason Grout and Sylvain Corlay and Paul Ivanov and Dami{\'a}n Avila and Safia Abdalla and Carol Willing},
2931
booktitle = {Positioning and Power in Academic Publishing: Players, Agents and Agendas},
32+
doi = {10.3233/978-1-61499-649-1-87},
3033
editor = {Fernando Loizides and Birgit Scmidt},
3134
pages = {97--90},
3235
publisher = {IOS Press},
@@ -54,12 +57,14 @@ @misc{Mann21b
5457
@inproceedings{McKi10,
5558
author = {W McKinney},
5659
booktitle = {Proceedings of the {Python} in Science Conference},
60+
doi = {10.25080/Majora-92bf1922-00a},
5761
pages = {51--56},
5862
title = {Data structures for statistical computing in {P}ython},
5963
year = {2010}}
6064

6165
@article{MullEtal15,
6266
author = {Muller, Eilif and Bednar, James A and Diesmann, Markus and Gewaltig, Marc-Oliver and Hines, Michael and Davison, Andrew P},
67+
doi = {10.3389/fninf.2015.00011},
6368
journal = {Frontiers in Neuroinformatics},
6469
pages = {11},
6570
title = {Python in neuroscience},
@@ -68,6 +73,7 @@ @article{MullEtal15
6873

6974
@article{PereGran07,
7075
author = {F P{\'e}rez and B E Granger},
76+
doi = {10.1109/MCSE.2007.53},
7177
journal = {Computing in science \& engineering},
7278
number = {3},
7379
pages = {21--29},
@@ -78,6 +84,7 @@ @article{PereGran07
7884
@inproceedings{RagaWill18,
7985
author = {Ragan-Kelley, Benjamin and Willing, Carol},
8086
booktitle = {Proceedings of the 17th Python in Science Conference},
87+
doi = {10.25080/MAJORA-4AF1F417-011},
8188
editor = {Akici, F and Lippa, D and Niederhut, D and and Pacer, M},
8289
pages = {113--120},
8390
title = {Binder 2.0-Reproducible, interactive, sharable environments for science at scale},
@@ -94,6 +101,7 @@ @techreport{vanREtal14
94101

95102
@article{VirtEtal20,
96103
author = {Virtanen, Pauli and Gommers, Ralf and Oliphant, Travis E. and Haberland, Matt and Reddy, Tyler and Cournapeau, David and Burovski, Evgeni and Peterson, Pearu and Weckesser, Warren and Bright, Jonathan and {van der Walt}, St{\'e}fan J. and Brett, Matthew and Wilson, Joshua and Millman, K. Jarrod and Mayorov, Nikolay and Nelson, Andrew R. J. and Jones, Eric and Kern, Robert and Larson, Eric and Carey, C J and Polat, {\.I}lhan and Feng, Yu and Moore, Eric W. and {VanderPlas}, Jake and Laxalde, Denis and Perktold, Josef and Cimrman, Robert and Henriksen, Ian and Quintero, E. A. and Harris, Charles R. and Archibald, Anne M. and Ribeiro, Ant{\^o}nio H. and Pedregosa, Fabian and {van Mulbregt}, Paul and {SciPy 1.0 Contributors}},
104+
doi = {10.1038/s41592-019-0686-2},
97105
journal = {Nature Methods},
98106
number = {3},
99107
pages = {261--272},
@@ -103,7 +111,8 @@ @article{VirtEtal20
103111

104112
@article{Wask21,
105113
author = {Michael L. Waskom},
106-
journal = {The Open Journal},
114+
doi = {10.21105/joss.03021},
115+
journal = {Journal of Open Source Software},
107116
number = {60},
108117
pages = {3021},
109118
title = {seaborn: statistical data visualization},

paper/paper.md

Lines changed: 93 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ tags:
55
- Jupyter Notebook
66
- JupyterLab
77
- Google Colab
8-
- Reproducibility
9-
- package-management
8+
- reproducibility
9+
- package management
1010
- import
1111
- install
1212
- pip
@@ -20,7 +20,7 @@ authors:
2020
affiliations:
2121
- name: Department of Psychological and Brain Sciences, Dartmouth College
2222
index: 1
23-
date: 17 December 2021
23+
date: 21 January 2022
2424
bibliography: paper.bib
2525
link-citations: true
2626
---
@@ -36,22 +36,22 @@ users to distribute their code and environment together in a single,
3636
ready-to-run Jupyter notebook [@KluyEtal16].
3737

3838
Importing `davos` enables an additional Python keyword: `smuggle`. The `smuggle`
39-
keyword can be used as a drop-in replacement for the built-in `import` keyword
40-
to load libraries, modules, and other objects into the current namespace.
41-
However, whereas `import` will fail if the requested package is not installed
42-
locally, `smuggle` statements can handle missing packages on the fly. If a
43-
smuggled package does not exist in the local environment, `davos` will install
44-
it, make its contents visible to Python's `import` machinery, and load it into
45-
the namespace for immediate use.
46-
47-
To provide greater control over the behavior of `smuggle` statements, `davos`
48-
also defines an additional construct, called *onion comments*. An onion comment
49-
is a special type of inline comment that can be placed on any line containing a
50-
`smuggle` statement to customize how `davos` determines whether and how
51-
smuggled packages should be installed. Onion comments follow a simple syntax
39+
statement can be used as a drop-in replacement for the built-in `import`
40+
statement to load libraries, modules, and other objects into the current
41+
namespace. However, whereas `import` will fail if the requested package is not
42+
installed locally, `smuggle` statements can handle missing packages on the fly.
43+
If a smuggled package does not exist in the local environment, `davos` will
44+
install it, make its contents visible to Python's `import` machinery, and load
45+
it into the namespace for immediate use.
46+
47+
For greater control over the behavior of `smuggle` statements, `davos` defines
48+
an additional construct called the *onion comment*. An onion comment is a
49+
special type of inline comment that can be placed on a line containing a
50+
`smuggle` statement to customize how `davos` determines whether and how the
51+
smuggled package should be installed. Onion comments follow a simple syntax
5252
based on the type comment syntax introduced in PEP 484 [@vanREtal14] and are
53-
designed to make controlling installation via `davos` intuitive and familiar.
54-
To construct an onion comment, simply provide the name of the installer program
53+
designed to make controlling installation via `davos` intuitive and familiar. To
54+
construct an onion comment, simply provide the name of the installer program
5555
(e.g., `pip`) and the same arguments one would use to install the package as
5656
desired manually via the command line:
5757

@@ -67,9 +67,9 @@ However, the most powerful use of the onion comment is making `smuggle`
6767
statements *version-sensitive*. Adding a [version
6868
specifier](https://www.python.org/dev/peps/pep-0440/#version-specifiers) to an
6969
onion comment will cause `davos` to search for the smuggled package in the local
70-
environment (as usual), and if it exists, additionally check whether the
71-
installed version satisfies the given constraint(s). If either check fails,
72-
`davos` will install and use a suitable version of the package:
70+
environment (as usual), and if it is found, further check whether the installed
71+
version satisfies the given constraint(s). If either check fails, `davos` will
72+
install and use a suitable version of the package:
7373

7474
![](snippets/snippet3.pdf)
7575

@@ -107,65 +107,89 @@ Data*](https://github.com/ContextLab/storytelling-with-data) [@Mann21b\; an open
107107
course on data science, visualization, and communication] and `abstract2paper`
108108
[@Mann21a\; a toy application of
109109
[GPT-Neo](https://github.com/EleutherAI/gpt-neo)]. A more extensive guide to
110-
using `davos`, additional examples, and a description of how it works are
111-
available [here](https://github.com/ContextLab/davos).
110+
using `davos`, additional examples, and implementation details are available
111+
[here](https://github.com/ContextLab/davos).
112112

113113

114114
# Statement of Need
115115

116116
Modern open science practices encourage sharing code and data to enable others
117-
to explore, reproduce, and extend existing work. Scientists, researchers, and
118-
educators may seek to share analyses with collaborators, students, the research
119-
community, or the general public. Python is among the most widely used and
120-
fastest-growing scientific programming languages [@MullEtal15]. In addition to
121-
the language's high-level, accessible syntax and large standard library, the
122-
Python ecosystem offers a powerful and extensive data science toolkit designed
123-
to facilitate rapid development and collaboration, including platforms for
124-
interactive development [e.g., Project Jupyter, @KluyEtal16\; Google
125-
Colaboratory], community-maintained libraries for data manipulation [e.g.,
126-
`NumPy`, @HarrEtal20; `SciPy`, @VirtEtal20; `Pandas`, @McKi10] and
117+
to explore, reproduce, and build on existing work. Scientists, researchers, and
118+
educators may seek to share research-related code with collaborators, students,
119+
the research community, or the general public. Python is among the most widely
120+
used and fastest-growing scientific programming languages [@MullEtal15]. In
121+
addition to the language's high-level, accessible syntax and large standard
122+
library, the Python ecosystem offers a powerful and extensive data science
123+
toolkit designed to facilitate rapid development and collaboration, including
124+
platforms for interactive programming [e.g., Project Jupyter, @KluyEtal16\;
125+
Google Colaboratory], community-maintained libraries for data manipulation
126+
[e.g., `NumPy`, @HarrEtal20; `SciPy`, @VirtEtal20; `Pandas`, @McKi10] and
127127
visualization [e.g., `Matplotlib`, @Hunt07; `seaborn`, @Wask21], and myriad
128-
other tools.
128+
other tools.
129129

130130
However, one impediment to sharing and reproducing computational work
131131
implemented in Python is that different versions of a given package or library
132132
can behave differently, use different syntax, add or drop support for specific
133-
functions or other libraries, address (or introduce) bugs, and so on. These
134-
challenges are true to some extent in any language or ecosystem, but they have a
135-
particular impact on the Python community due to its unusually rapid growth
136-
relative to other languages. Ensuring stable and reproducible results across
137-
users typically requires ensuring that the same versions of each library are
138-
installed. One approach is to use containerized or virtualized environments
139-
(e.g., using [Docker](https://www.docker.com/),
133+
functions or other libraries, address (or introduce) bugs, and so on. While
134+
these challenges are present to some extent in any language or ecosystem, they
135+
have a particular impact on the Python community due to its unusually rapid
136+
growth relative to other languages. Ensuring stable and reproducible results
137+
across users typically requires ensuring that shared code is always run with the
138+
same set of versions for each package used. One common approach to solving this
139+
problem is to create containerized or virtualized environments (e.g., using
140+
[Docker](https://www.docker.com/),
140141
[Singularity](https://sylabs.io/singularity/), or
141-
[conda](https://docs.conda.io/en/latest/)) that are effectively cordoned off
142-
from the user's primary Python installation. Configuration files may be used
143-
alongside these tools to construct environments that guarantee (within limits)
144-
the same or similar functionality across systems. However, a downside to
145-
relying on this approach is that it is highly resource intensive. For example,
146-
distributing research code that relies on a particular Docker image to run
147-
correctly requires the authors to distribute additional configuration files and
148-
instructions alongside their main code. Users must then download or build the
149-
image on their machine, which uses additional time and storage.
150-
151-
`davos` provides an alternative way of ensuring stable functionality of iPython
152-
notebooks across users that is lightweight and contained entirely within the
153-
notebook file itself. All setup and configuration of packages needed to run the
154-
code in the notebook, including ensuring that the correct version of each
155-
package is utilized, may be managed by `davos`. Bypassing the need for
156-
the user to set up containers or virtual environments can enable users to run
157-
the notebook quickly and more easily.
158-
159-
A second benefit of using `davos` (either in lieu of or alongside a different
160-
environment management tool) is that `smuggle` statements and onion comments
161-
continue to ensure requirements are satisfied after they are initially
162-
installed. For example, suppose a developer decides to install version 1.0 of
163-
package `x`, a critical library for some code they are working on. If `x`
164-
version 1.1 is a dependency of another package, `y`, then installing package `y`
165-
might overwrite version 1.0 of package `x` with version 1.1. This can lead to
166-
unexpected behavior if versions 1.0 and 1.1 of package `x` differ. To protect
167-
against unexpected behavior, `smuggle` statements and onion comments may be
168-
used to ensure that the expected versions of each library are imported.
142+
[conda](https://docs.conda.io/en/latest/)) that house fully isolated Python
143+
installations tailored to specific projects. These environments may be shared
144+
publicly as configuration files from which other users may build identical
145+
copies themselves. While effective, one drawback to this approach is that it can
146+
introduce a level of complexity beyond what is warranted for many simpler use
147+
cases. For example, distributing research code that relies on a particular
148+
Docker image to run properly not only necessitates extra configuration files and
149+
setup steps, but requires that both the author and end user install and navigate
150+
additional software that is often more complicated and resource-intensive than
151+
the actual code being shared. These added prerequisites clash with the
152+
simplicity and accessibility that have helped popularize Python among
153+
researchers, and can create barriers to both contributing to and taking
154+
advantage of open science.
155+
156+
`davos` provides an alternative way to ensure stable functionality of Jupyter
157+
notebooks across users and over time that is intuitive, lightweight, and
158+
contained entirely within the notebook file itself. Using `smuggle` statements
159+
and onion comments, required packages can be specified directly within the code
160+
that uses them and automatically installed as they are needed. This offers two
161+
notable advantages over typical approaches to dependency management. First, it
162+
simplifies and expedites the process of sharing reproducible workflows by
163+
eliminating the need for additional configuration files, pre-execution setup,
164+
and environment management software (aside from `davos` itself). With `davos`,
165+
analyses, tutorials, and demos can be packaged and shared as "batteries
166+
included" notebooks that can be downloaded and immediately run, making them more
167+
accessible to less technical users.
168+
169+
Second, `smuggle` statements and onion comments continue to ensure requirements
170+
are satisfied after they are initially installed. Most dependency specification
171+
schemes follow a common strategy: required packages and package versions are
172+
listed in a configuration file (e.g., a `requirements.txt`, `pyproject.toml`,
173+
`environment.yml`, `Pipfile`, `RUN` instructions in a `Dockerfile`, etc.) which
174+
is used to install them in a Python environment upfront. After this initial
175+
setup, however, this method generally does not ensure that the specified
176+
requirements *remain* installed, allowing them to be easily
177+
altered—sometimes inadvertently. This can lead to subtle issues when
178+
writing reproducible code in such a preconfigured environment. For instance,
179+
suppose a researcher has implemented a series of analyses using version 1.0 of
180+
"Package *X*," and later decides to perform an additional analysis that requires
181+
installing "Package *Y*." If Package *Y* depends on version 1.1 of Package *X*,
182+
then Package *X* will be upgraded to accommodate this new requirement. And if
183+
the researcher does not notice this change, differences between the two Package
184+
*X* versions risk introducing bugs into previously written code. Using `davos`,
185+
either in lieu of or alongside a different environment management tool, provides
186+
a safeguard against this situation. `smuggle` statements and onion comments
187+
enforce requirements every time they are executed, guaranteeing the expected
188+
version of each package is always used. This would not only catch and correct
189+
the unintentional change to Package *X*, but would also allow the researcher to
190+
choose whether to manually resolve the inconsistency or, if appropriate,
191+
`smuggle` different versions of the package as necessary.
192+
169193

170194
# Origin of the Name
171195

0 commit comments

Comments
 (0)