Skip to content

Commit 6fa37eb

Browse files
Merge pull request #39 from StanfordHPDS/uv
update workflows to use uv and make
2 parents e2c2591 + 4d15a7d commit 6fa37eb

File tree

1 file changed

+42
-19
lines changed

1 file changed

+42
-19
lines changed

chapters/09-code-workflow-agreements.qmd

Lines changed: 42 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -420,31 +420,60 @@ One common problem in renv is that it keeps track of but does not manage the R v
420420

421421
Note: The Python ecosystem for managing environments is vast. See <https://alpopkes.com/posts/python/packaging_tools/> for an overview.
422422

423-
We currently recommend [Conda](https://docs.conda.io/en/latest/) via the [miniconda distribution](https://docs.anaconda.com/free/miniconda/index.html). Conda allows you to install packages from Conda channels, set up virtual environments, and control the version of Python.
423+
We recommend [uv](https://docs.astral.sh/uv/getting-started/installation/). uv allows you to install packages, set up virtual environments, and control the version of Python. uv manages most of this for you and works well across operating systems. It's also very fast.
424424

425-
Notably, we recommend a *Conda-first* approach. If you're installing a package, use `conda install` before trying `pip install`. PyPi and Conda channels build packages differently, so it's best to stick with one style where possible. However, Conda has a smaller, more refined selection of packages compared to PyPi, so some packages may only be available via `pip`. See [this blog post](https://www.anaconda.com/blog/understanding-conda-and-pip) for more information on the differences between the two.
426-
427-
Conda treats Python like other packages, so managing it is similar. One helpful thing you can do is specify a Python version while creating an environment. See the [Conda documentation](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-python.html) for other ways of interacting with the Python version.
425+
`uv init` will set up a uv environment for you, creating several files: `pyproject.toml`, `.python-version`, and `main.py`, as well as a few other files if they don't already exist, such as a `README` and `.gitignore` file. Once you start adding packages to the project, uv will also generate a lock file with the resolved package dependencies (`uv.lock`) and a virtual environment with [venv](https://docs.python.org/3/library/venv.html) in the `.venv/` directory.
428426

429427
``` bash
430-
# create a new virtual environment called projectenv
431-
conda create --name projectenv python=3.12.1
428+
# create a new project called my_new_project
429+
uv init my_new_project
430+
431+
# or, from within an existing project, just run init
432+
uv init
432433

433-
# activate the environment projectenv
434-
conda activate projectenv
434+
# manually sync the virtual environment
435+
# mostly uv will handle this for you but it can be handy
436+
uv sync
435437

436438
# add a package to your environment
437-
conda install polars
439+
# don't use pip! This will install from pip but manage the environment for you
440+
uv add polars
441+
```
442+
443+
444+
To run scripts, use `uv run`. This is a drop-in replacement for `python script.py`; `uv run` always runs the script in the virtual environment for the project, so it is a good practice.
445+
446+
```python
447+
# run a script in the virtual environment
448+
uv run my_script.py
449+
```
450+
451+
To manage Python versions, use `uv python`
452+
453+
```python
454+
# install Python 3.12
455+
uv python install 3.12
456+
457+
# and set it as the Python version for the project
458+
uv python pin 3.12
459+
```
460+
461+
To run development tools like `ruff`, use `uvx`, which allows you to use such tools without adding them as dependencies to your project.
462+
463+
```python
464+
uvx ruff format
438465
```
439466
:::
440467

468+
See the [documentation](https://docs.astral.sh/uv/getting-started/features/) for other commands, particularly those under the Python versions, projects, and tools headings.
469+
441470
## Opt-in Workflows {#sec-opt-in}
442471

443472
Opt-in workflows are things we do not require for a project but for which we offer guidance. Such workflows also allow the team to experiment with new things and see what works for projects and when.
444473

445474
### Pipelines {#sec-pipelines}
446475

447-
Pipeline tools are software that manage the execution of code. What's practical about this for research projects is that pipeline tools track the relationship between components in your project (meaning it knows which order to run things in automatically) and will only run those components when they are out of date (meaning you don't necessarily need to rerun your entire project because you updated one part of the code).
476+
Pipeline tools are software that manage the execution of code. What's practical about this for research projects is that pipeline tools track the relationship between components in your project (meaning it knows which order to run things in automatically) and will only run those components when they are out of date (meaning you don't necessarily need to rerun your entire project because you updated one part of the code). They are also very handy for reproducing code, because they only require a command or two to run the entire pipeline.
448477

449478
Pipeline tools are helpful for projects of any size, but they are particularly suited to complex or computationally intense projects.
450479

@@ -455,19 +484,13 @@ The best pipeline tool in R is the targets package. targets is a native R tool,
455484

456485
targets has [excellent documentation and tutorials](https://books.ropensci.org/targets/), so we point you there for guidance.
457486

458-
It's also possible to use tools like Make and Snakemake, among others (see the Python tab), with R, although we recommend targets for projects that are mostly R.
487+
It's also possible to use tools like Make (see the Python tab) among others , with R, although we recommend targets for projects that are mostly R. For projects that are a mix of languages, Make may be a better fit.
459488

460489
## Python
461490

462-
Python has several pipeline tools that are used in data engineering. For these larger data projects, these tools are sometimes called *orchestration* tools. That said, many of them are much more complex than is needed for a single research project.
463-
464-
We don't currently have a recommendation. Here are the tools we should explore:
491+
Python has several pipeline tools that are used in data engineering. For these larger data projects, these tools are sometimes called *orchestration* tools. That said, many of them are much more complex than is needed for a single research project.
465492

466-
1. [Make](https://www.gnu.org/software/make/): Make is one of the oldest and most popular pipeline tools--over 40 years old. It shows its age in some ways, but it's also battle-tested. See [this tutorial](https://third-bit.com/py-rse/automate.html) for an example of running an analysis with Make.
467-
2. [Snakemake](https://snakemake.github.io/): A Python tool influenced by Make, it's very similar in spirit but more modern. It's easier to read and customize, and you can write Python code within the Snakefile. It also works nicely with Python and R scripts. One handy thing is that you can access Snakemake inputs and outputs through magic objects for both Python and R. That makes it useful for, e.g., dynamic rules and rules where you want to use inputs for reports. Snakemake has good support for Conda environments in particular.
468-
3. [Dagster](https://docs.dagster.io/getting-started/hello-dagster): Dagster is a different approach that uses decorators to tag Python functions as "assets," the apparent equivalent of targets in other tools. It has a nice UI tool for visualizing the nodes and relationships of the project, as well as "materializing" (running) them. Dagster is a company, but the tool has an open-source version.
469-
4. [Prefect](https://www.prefect.io/opensource): Prefect is an increasingly popular tool for orchestration. Like Dagster, it is a company-based open-source tool that uses decorators to tag Python functions. It also has a UI.
470-
5. [AirFlow](https://airflow.apache.org/): AirFlow is an open-source orchestration tool. Notably, Google Cloud supports a UI for creating and running AirFlow called [Cloud Composer](https://cloud.google.com/composer/docs/composer-2/composer-overview). AirFlow also has a recent decorator-based API called [TaskFlow](https://airflow.apache.org/docs/apache-airflow/stable/tutorial/taskflow.html), which is more in line with some of the above tools.
493+
For research projects, we recommend [GNU Make](https://www.gnu.org/software/make/). Make is one of the oldest and most popular pipeline tools--over 40 years old. It shows its age in some ways, but it's also battle-tested. See [this tutorial](https://third-bit.com/py-rse/automate.html) for an example of running an analysis with Make.
471494
:::
472495

473496
### Testing {#sec-tests}

0 commit comments

Comments
 (0)