Skip to content

Commit fa1ce7d

Browse files
authored
Merge pull request #175 from VariantEffect/release-2024.0.0
Release 2024.0.0
2 parents 89945b5 + 6940c1c commit fa1ce7d

File tree

126 files changed

+11696
-4615
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

126 files changed

+11696
-4615
lines changed

.github/workflows/run-tests-on-push.yml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,3 +42,16 @@ jobs:
4242
- run: pip install .[dev,server]
4343
- run: pytest tests/
4444

45+
run-mypy-3_10:
46+
runs-on: ubuntu-latest
47+
name: MyPy checks on Python 3.10
48+
steps:
49+
- uses: actions/checkout@v4
50+
- uses: actions/setup-python@v5
51+
with:
52+
python-version: "3.10"
53+
cache: 'pip'
54+
- run: pip install --upgrade pip
55+
- run: pip install .[dev,server]
56+
- run: mypy src/
57+

.gitignore

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -94,10 +94,6 @@ ipython_config.py
9494
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
9595
__pypackages__/
9696

97-
# Celery stuff
98-
celerybeat-schedule
99-
celerybeat.pid
100-
10197
# SageMath parsed files
10298
*.sage.py
10399

Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ FROM python:3.9 AS downloader
33
WORKDIR /data
44

55
# Install tools necessary used to install samtools and htslib so we can configure fasta files for genomic assembly.
6-
RUN apt-get update && apt-get install -y \
6+
RUN apt-get clean && apt-get update && apt-get install -y \
77
build-essential \
88
curl \
99
git \
@@ -27,7 +27,7 @@ RUN curl -L https://github.com/samtools/htslib/releases/download/${htsversion}/h
2727
curl -L https://github.com/samtools/bcftools/releases/download/${htsversion}/bcftools-${htsversion}.tar.bz2 | tar xj && \
2828
(cd bcftools-${htsversion} && ./configure --enable-libgsl --enable-perl-filters --with-htslib=system && make install)
2929

30-
# Fetch and index GRCh37 and GRCh38 assemblies. These will augment seqrepo transcript sequences.
30+
# Fetch and index GRCh37 and GRCh38 assemblies.
3131
RUN wget -O - https://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/all_assembly_versions/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.fna.gz | gzip -d | bgzip > GCF_000001405.25_GRCh37.p13_genomic.fna.gz
3232
RUN wget -O - https://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/all_assembly_versions/GCF_000001405.39_GRCh38.p13/GCF_000001405.39_GRCh38.p13_genomic.fna.gz | gzip -d | bgzip > GCF_000001405.39_GRCh38.p13_genomic.fna.gz
3333
RUN samtools faidx GCF_000001405.25_GRCh37.p13_genomic.fna.gz

Dockerfile.test

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
FROM ubuntu:latest
2+
RUN apt-get update
3+
RUN DEBIAN_FRONTEND=noninteractive apt-get upgrade -y
4+
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y python3 python3-pip python3-psycopg2 postgresql libpq-dev
5+
WORKDIR /code
6+
7+
# Install Python packages.
8+
COPY LICENSE README.md pyproject.toml ./
9+
COPY src/ ./src/
10+
COPY tests/ ./tests/
11+
COPY mypy_stubs ./mypy_stubs/
12+
13+
RUN pip install --no-cache-dir --upgrade pip
14+
RUN pip install --no-cache-dir --upgrade .[dev,server]
15+
RUN useradd testuser -d /code
16+
17+
RUN --network=none su testuser -c pytest

Dockerfile.worker

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
FROM python:3.9 AS downloader
2+
3+
WORKDIR /data
4+
5+
# Install tools necessary used to install samtools and htslib so we can configure fasta files for genomic assembly.
6+
RUN apt-get clean && apt-get update && apt-get install -y \
7+
build-essential \
8+
curl \
9+
git \
10+
libbz2-dev \
11+
libcurl4-openssl-dev \
12+
libgsl0-dev \
13+
liblzma-dev \
14+
libncurses5-dev \
15+
libperl-dev \
16+
libssl-dev \
17+
zlib1g-dev \
18+
&& rm -rf /var/lib/apt/lists/*
19+
20+
# Install samtools and htslib.
21+
ARG htsversion=1.19
22+
RUN curl -L https://github.com/samtools/htslib/releases/download/${htsversion}/htslib-${htsversion}.tar.bz2 | tar xj && \
23+
(cd htslib-${htsversion} && ./configure --enable-plugins --with-plugin-path='$(libexecdir)/htslib:/usr/libexec/htslib' && make install) && \
24+
ldconfig && \
25+
curl -L https://github.com/samtools/samtools/releases/download/${htsversion}/samtools-${htsversion}.tar.bz2 | tar xj && \
26+
(cd samtools-${htsversion} && ./configure --with-htslib=system && make install) && \
27+
curl -L https://github.com/samtools/bcftools/releases/download/${htsversion}/bcftools-${htsversion}.tar.bz2 | tar xj && \
28+
(cd bcftools-${htsversion} && ./configure --enable-libgsl --enable-perl-filters --with-htslib=system && make install)
29+
30+
# Fetch and index GRCh37 and GRCh38 assemblies.
31+
RUN wget -O - https://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/all_assembly_versions/GCF_000001405.25_GRCh37.p13/GCF_000001405.25_GRCh37.p13_genomic.fna.gz | gzip -d | bgzip > GCF_000001405.25_GRCh37.p13_genomic.fna.gz
32+
RUN wget -O - https://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/all_assembly_versions/GCF_000001405.39_GRCh38.p13/GCF_000001405.39_GRCh38.p13_genomic.fna.gz | gzip -d | bgzip > GCF_000001405.39_GRCh38.p13_genomic.fna.gz
33+
RUN samtools faidx GCF_000001405.25_GRCh37.p13_genomic.fna.gz
34+
RUN samtools faidx GCF_000001405.39_GRCh38.p13_genomic.fna.gz
35+
36+
FROM python:3.9
37+
COPY --from=downloader /data /data
38+
39+
WORKDIR /code
40+
41+
# Install the application dependencies.
42+
COPY ./requirements.txt /code/requirements.txt
43+
RUN pip install --no-cache-dir --upgrade -r /code/requirements.txt
44+
45+
# Install the application code.
46+
COPY src /code/src
47+
COPY src/mavedb/server_main.py /code/main.py
48+
49+
ENV PYTHONPATH "${PYTHONPATH}:/code/src"
50+
51+
CMD ["arq", "mavedb.worker.WorkerSettings"]

README.md

Lines changed: 19 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# mavedb-api
22

33
API for MaveDB. MaveDB is a biological database for Multiplex Assays of Variant Effect (MAVE) datasets.
4-
The API powers the MaveDB website at [mavedb.org](https://www.mavedb.org) and can also be called separately (see
5-
instructions [below](#using-mavedb-api)).
4+
The API powers the MaveDB website at [mavedb.org](https://www.mavedb.org) and can also be called separately (see
5+
instructions [below](#using-mavedb-api)).
66

77

88
For more information about MaveDB or to cite MaveDB please refer to the
@@ -44,56 +44,45 @@ The distribution can be uploaded to PyPI using [twine](https://twine.readthedocs
4444
For use as a server, this distribution includes an optional set of dependencies, which are only invoked if the package
4545
is installed with `pip install mavedb[server]`.
4646

47-
### Running the API server in Docker on production and test systems
47+
### Running a local version of the API server
4848

4949
First build the application's Docker image:
5050
```
5151
docker build --tag mavedb-api/mavedb-api .
5252
```
5353
Then start the application and its database:
5454
```
55-
docker-compose -f docker-compose-prod.yml up -d
55+
docker-compose -f docker-compose-local.yml up -d
5656
```
5757
Omit `-d` (daemon) if you want to run the application in your terminal session, for instance to see startup errors without having
5858
to inspect the Docker container's log.
5959

6060
To stop the application when it is running as a daemon, run
6161
```
62-
docker-compose -f docker-compose-prod.yml down
62+
docker-compose -f docker-compose-local.yml down
6363
```
6464

65-
`docker-compose-prod.yml` configures two containers: one for the API server and one for the PostgreSQL database. The
66-
The database stores data in a Docker volume named `mavedb-data`, which will persist after running `docker-compose down`.
65+
`docker-compose-local.yml` configures four containers: one for the API server, one for the PostgreSQL database, one for the
66+
worker node and one for the Redis cache which acts as the job queue for the worker node. The worker node stores data in a Docker
67+
volume named `mavedb-redis` and the database stores data in a Docker volume named `mavedb-data`. Both these volumes will persist
68+
after running `docker-compose down`.
6769

6870
**Notes**
6971
1. The `mavedb-api` container requires the following environment variables, which are configured in
70-
`docker-compose-prod.yml`:
72+
`docker-compose-local.yml`:
7173

7274
- DB_HOST
7375
- DB_PORT
7476
- DB_DATABASE_NAME
7577
- DB_USERNAME
7678
- DB_PASSWORD
7779
- NCBI_API_KEY
80+
- REDIS_IP
81+
- REDIS_PORT
7882

7983
The database username and password should be edited for production deployments. `NCBI_API_KEY` will be removed in
8084
the future. **TODO** Move these to an .env file.
8185

82-
2. In the procedure given above, we do not push the Docker image to a repository like Docker Hub; we simply build the
83-
image on the machine where it will be used. But to deploy the API server on the AWS-hosted test site, first tag the
84-
image appropriately and push it to Elastic Container Repository. (These commands require )
85-
```
86-
export ECRPASSWORD=$(aws ecr get-login-password --region us-west-2 --profile mavedb-test)
87-
echo $ECRPASSWORD | docker login --username AWS --password-stdin {aws_account_id}.dkr.ecr.us-west-2.amazonaws.com
88-
docker tag mavedb-api:latest {aws_account_id}.dkr.ecr.us-west-2.amazonaws.com/mavedb-api
89-
docker push {aws_account_id}.dkr.ecr.us-west-2.amazonaws.com/mavedb-api
90-
```
91-
These commands presuppose that you have the [AWS CLI](https://aws.amazon.com/cli/) installed and have created a named
92-
profile, `mavedb-test`, with your AWS credentials.
93-
94-
With the Docker image pushed to ECR, you can now deploy the application. **TODO** Add instructions if we want to
95-
document this.
96-
9786
### Running the API server in Docker for development
9887

9988
A similar procedure can be followed to run the API server in development mode on your local machine. There are a couple
@@ -134,3 +123,10 @@ Before using either of these methods, configure the environment variables descri
134123

135124
If you use PyCharm, the first method can be used in a Python run configuration, but the second method supports PyCharm's
136125
FastAPI run configuration.
126+
127+
### Running the API server for production
128+
129+
We maintain deployment configuration options and steps within a [private repository](https://github.com/VariantEffect/mavedb-deployment) used for deploying this source code to
130+
the production MaveDB environment. The main difference between the production setup and these local setups is that
131+
the worker and api services are split into distinct environments, allowing them to scale up or down individually
132+
dependent on need.

alembic/manual_migrations/README

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
This directory contains database migrations run manually for which there is no simple upgrade/downgrade path. They are not runnable as is and will need to be either manually added to an existing alembic migration or transformed into raw SQL statements and executed directly.

0 commit comments

Comments
 (0)