Skip to content
This repository was archived by the owner on Dec 31, 2025. It is now read-only.

Commit 73b0928

Browse files
authored
Merge pull request #31 from poldrack/text/workflows-Dec27
Text/workflows dec27
2 parents 51c7377 + d0bd54b commit 73b0928

File tree

20 files changed

+2538
-577
lines changed

20 files changed

+2538
-577
lines changed

.dockerignore

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Git
2+
.git
3+
.gitignore
4+
5+
# Python
6+
__pycache__
7+
*.py[cod]
8+
*$py.class
9+
*.so
10+
.Python
11+
.venv
12+
venv
13+
ENV
14+
15+
# Build artifacts
16+
_build
17+
*.egg-info
18+
dist
19+
build
20+
21+
# IDE
22+
.idea
23+
.vscode
24+
*.swp
25+
*.swo
26+
27+
# Jupyter
28+
.ipynb_checkpoints
29+
30+
# OS
31+
.DS_Store
32+
Thumbs.db
33+
34+
# Docker (keep Dockerfile accessible)
35+
docker/Makefile
36+
docker/README.md
37+
38+
# Misc
39+
*.log
40+
.env
41+
.env.*

book/workflows.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -505,6 +505,18 @@ total 5
505505

506506
Similarly, Snakemake will rerun the workflow if any of the scripts used to run the workflow are modified. However, it's important to note that it will not identify changes in the modules that are imported. In that case you would need to rerun the workflow in order to re-execute the relevant steps.
507507

508+
#### Reproducible environments with Conda
509+
510+
after installing miniconda:
511+
512+
513+
```bash
514+
conda create -c conda-forge -c bioconda -c nodefaults -n bettercode snakemake
515+
conda activate bettercode
516+
pip install -e .
517+
```
518+
519+
508520
## Scaling to a complex workflow
509521

510522
We now turn to a more realistic and complex scientific data analysis workflow. For this example I will use an analysis of single-cell RNA-sequencing data to determine how gene expression in immune system cells changes with age. This analysis will utilize a [large openly available dataset](https://cellxgene.cziscience.com/collections/dde06e0f-ab3b-46be-96a2-a8082383c4a1) that includes data from 982 people comprising about 1.3 million peripheral blood mononuclear cells (i.e. white blood cells) for about 35K transcripts. I chose this particular example for several reasons:

docker/Dockerfile

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Dockerfile for BetterCodeBetterScience
2+
# Builds an image with all dependencies for running the code examples
3+
4+
FROM python:3.12-slim-bookworm
5+
6+
LABEL maintainer="Russell Poldrack"
7+
LABEL description="Container for BetterCodeBetterScience book code examples"
8+
9+
# Prevent interactive prompts during package installation
10+
ENV DEBIAN_FRONTEND=noninteractive
11+
12+
# Install system dependencies
13+
RUN apt-get update && apt-get install -y --no-install-recommends \
14+
# Build essentials
15+
build-essential \
16+
gcc \
17+
g++ \
18+
gfortran \
19+
# Git for datalad and version control
20+
git \
21+
git-annex \
22+
# HDF5 for h5py
23+
libhdf5-dev \
24+
# For scientific packages
25+
libopenblas-dev \
26+
liblapack-dev \
27+
# For igraph/leidenalg
28+
libigraph-dev \
29+
# For image processing
30+
libjpeg-dev \
31+
libpng-dev \
32+
# For SSL/networking
33+
libssl-dev \
34+
libcurl4-openssl-dev \
35+
# For XML parsing
36+
libxml2-dev \
37+
libxslt1-dev \
38+
# R and dependencies for rpy2
39+
r-base \
40+
r-base-dev \
41+
# Misc utilities
42+
curl \
43+
wget \
44+
ca-certificates \
45+
&& rm -rf /var/lib/apt/lists/*
46+
47+
# Install uv for fast Python package management
48+
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
49+
50+
# Set up working directory
51+
WORKDIR /app
52+
53+
# Copy project files
54+
COPY pyproject.toml README.md ./
55+
COPY src/ ./src/
56+
57+
# Create virtual environment and install dependencies
58+
RUN uv venv /app/.venv
59+
ENV VIRTUAL_ENV=/app/.venv
60+
ENV PATH="/app/.venv/bin:$PATH"
61+
62+
# Install the project and all dependencies
63+
RUN uv pip install -e .
64+
65+
# Copy remaining project files
66+
COPY book/ ./book/
67+
COPY tests/ ./tests/
68+
COPY data/ ./data/
69+
COPY scripts/ ./scripts/
70+
COPY myst.yml ./
71+
72+
# Set default command
73+
CMD ["python", "--version"]

docker/Makefile

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Run from project root: make -f docker/Makefile build
2+
build:
3+
docker build -f docker/Dockerfile -t bettercode .
4+
5+
# Or run from docker directory: make build-from-here
6+
build-from-here:
7+
docker build -f Dockerfile -t bettercode ..

docker/README.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Docker for BetterCodeBetterScience
2+
3+
## Building the image
4+
5+
From the repository root:
6+
7+
```bash
8+
docker build -f docker/Dockerfile -t bettercode .
9+
```
10+
11+
## Running the container
12+
13+
Interactive shell:
14+
```bash
15+
docker run -it bettercode /bin/bash
16+
```
17+
18+
Run tests:
19+
```bash
20+
docker run bettercode pytest
21+
```
22+
23+
Mount local data directory:
24+
```bash
25+
docker run -v /path/to/local/data:/data bettercode python script.py
26+
```
File renamed without changes.

src/bettercode/rnaseq/snakemake_workflow/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ rulegraph:
77

88
# Run the full workflow
99
run:
10-
snakemake --cores 8 --config datadir=$(DATADIR)/immune_aging/wf_snakemake/
10+
snakemake --cores 8 --sdm conda --config datadir=$(DATADIR)/immune_aging/wf_snakemake/
1111
# Generate HTML report (run after workflow completes)
1212
report:
1313
snakemake --report $(DATADIR)/immune_aging/wf_snakemake/report.html --config datadir=$(DATADIR)/immune_aging/

src/bettercode/rnaseq/snakemake_workflow/Snakefile

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,8 @@ rule aggregate_results:
8888
RESULTS_DIR / "workflow_complete.txt",
8989
log:
9090
LOG_DIR / "aggregate_results.log",
91+
conda:
92+
"bettercode"
9193
script:
9294
"scripts/aggregate_results.py"
9395

src/bettercode/rnaseq/snakemake_workflow/rules/per_cell_type.smk

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ rule differential_expression:
2929
threads: config["differential_expression"]["n_cpus"]
3030
log:
3131
LOG_DIR / "step08_de_{cell_type}.log",
32+
conda:
33+
"bettercode"
3234
script:
3335
"../scripts/differential_expression.py"
3436

@@ -54,6 +56,8 @@ rule pathway_analysis:
5456
),
5557
log:
5658
LOG_DIR / "step09_gsea_{cell_type}.log",
59+
conda:
60+
"bettercode"
5761
script:
5862
"../scripts/gsea.py"
5963

@@ -81,6 +85,8 @@ rule overrepresentation:
8185
),
8286
log:
8387
LOG_DIR / "step10_enrichr_{cell_type}.log",
88+
conda:
89+
"bettercode"
8490
script:
8591
"../scripts/enrichr.py"
8692

@@ -110,5 +116,7 @@ rule predictive_modeling:
110116
),
111117
log:
112118
LOG_DIR / "step11_prediction_{cell_type}.log",
119+
conda:
120+
"bettercode"
113121
script:
114122
"../scripts/prediction.py"

src/bettercode/rnaseq/snakemake_workflow/rules/preprocessing.smk

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ rule download_data:
1818
url=config["url"],
1919
log:
2020
LOG_DIR / "step01_download.log",
21+
conda:
22+
"bettercode"
2123
script:
2224
"../scripts/download.py"
2325

@@ -40,6 +42,8 @@ rule filter_data:
4042
figure_dir=str(FIGURE_DIR),
4143
log:
4244
LOG_DIR / "step02_filtering.log",
45+
conda:
46+
"bettercode"
4347
script:
4448
"../scripts/filter.py"
4549

@@ -81,6 +85,8 @@ rule quality_control:
8185
figure_dir=str(FIGURE_DIR),
8286
log:
8387
LOG_DIR / "step03_qc.log",
88+
conda:
89+
"bettercode"
8490
script:
8591
"../scripts/qc.py"
8692

@@ -98,6 +104,8 @@ rule preprocess:
98104
batch_key=config["preprocessing"]["batch_key"],
99105
log:
100106
LOG_DIR / "step04_preprocessing.log",
107+
conda:
108+
"bettercode"
101109
script:
102110
"../scripts/preprocess.py"
103111

@@ -126,6 +134,8 @@ rule dimensionality_reduction:
126134
figure_dir=str(FIGURE_DIR),
127135
log:
128136
LOG_DIR / "step05_dimred.log",
137+
conda:
138+
"bettercode"
129139
script:
130140
"../scripts/dimred.py"
131141

@@ -146,5 +156,7 @@ rule clustering:
146156
figure_dir=str(FIGURE_DIR),
147157
log:
148158
LOG_DIR / "step06_clustering.log",
159+
conda:
160+
"bettercode"
149161
script:
150162
"../scripts/cluster.py"

0 commit comments

Comments
 (0)