A base template for creating Django applications for collecting patient data.
Note
This repo is an adaptation of https://github.com/ssec-jhu/biospecdb.
This database application stands as an online community collection point for patient biosample array data and metadata, to be used as an AI training set, generating a tool for easy disease detection from future patient biosamples.
For additional cmds see the Conda cheat-sheet.
- Download and install either miniconda or anaconda.
- Create new environment (env) and install
conda create -n <environment_name> - Activate/switch to new env
conda activate <environment_name> cdinto repo dir.- Install
pythonandpipconda install python=3.11 pip - Install all required dependencies (assuming local dev work)
pip install -r requirements/dev.txt.
- Follow the run instructions below using docker-compose.
cdinto repo dir.conda activate <environment_name>- Build and install package in <environment_name> conda env:
pip install . - Do the same but in dev/editable mode (changes to repo will be reflected in env installation upon python kernel restart)
NOTE: This is the preferred installation method for dev work.
pip install -e .. NOTE: If you didn't install dependencies fromrequirements/dev.txt, you can install a looser constrained set of deps using:pip install -e .[dev].
- Download & install Docker - see Docker install docs.
cdinto repo dir.- Setup env vars:
export DJANGO_SUPERUSER_PASSWORD=adminsource scripts/prd/gen_secret_key.sh
- Copy both the SSL certificate and the private key to
nginx/cert.crt&nginx/cert.keyrespectively. For local testing of the prd service see creating self-signed SSL certificates for instructions on how to create "fake" ones. - Build and run with
docker compose up, add-dto run in the background. - The site can then be accessed using any browser from
https://localhost
- Follow the above Build with Python ecosystem instructions.
- For a completely fresh start and rebuild of the database:
./scripts/dev/rebuild_sqlite.sh. - Run
DJANGO_SETTINGS_MODULE=biodb.settings.dev python manage.py runserver 0.0.0.0:8000 - The site can then be accessed using any browser from
http://localhost:8000
Warning! This is intended of local testing only and not for use in production.
- Install
openssl cd nginxopenssl req -newkey rsa:4096 -nodes -x509 -out cert.crt -keyout cert.key -days 365Note: This certificate is valid for 365 days. Also, when accessing https://localhost your browser will flag the site as unsafe and the certificate as invalid.
EXPLORER_CHARTS_ENABLED: Include the array data files, if present in query results, for download as zip file.EXPLORER_DATA_EXPORTERS_ALLOW_DATA_FILE_ALIAS: Exhaustively scan query result values for relevant filepaths to collect data files. Does nothing whenEXPLORER_DATA_EXPORTERS_INCLUDE_DATA_FILES is False.AUTO_ANNOTATE: Automatically run "default" annotators when new array data is added. Quality Control Annotations.RUN_DEFAULT_ANNOTATORS_WHEN_SAVED: Run newly added/updated annotator on all array data if annotator.default is True. WARNING: This may be time-consuming if the annotators takes a while to run and there are a lot of array data samples in the database. See Quality Control Annotations.
We're currently using sqlite requiring the following setup instructions:
For a quickstart run the provided script rebuild_sqlite.sh, otherwise follow the instructions
below. The default superuser credentials are username: admin, password: admin. Set the env var
DJANGO_SUPERUSER_PASSWORD to override the default given in rebuild_sqlite.sh.
_NOTE: For postgresql usage, run the provided script rebuild_postgres.sh.
- cd into repo
python manage.py migratepython manage.py migrate --database=bsrpython manage.py createsuperuserpython manage.py loaddata centers queriespython manage.py loaddata --database=bsr centers observables instruments qcannotators biosampletypes arraymeasurementtypespython manage.py update_sql_views flat_viewpython manage.py runserver
For running the Quality Control Annotators (QCAnnotators) use the following:
python manage.py run_qc_annotators
...using the following option --no_reruns to NOT run annotators on existing annotations, but instead leave their
existing computed values as they are. This will, however, still run all "default" annotators on all SepectralData
entries that have not yet been annotated.
On subsequent deployments only python manage.py runserver is needed, unless the db (db.sqlite) is nuked from
disk.
When the models are changed only the following migration commands are required:
python manage.py makemigrations userpython manage.py makemigrations uploaderpython manage.py makemigrations cataloggit add biodb/apps/uploader/migrationsgit add biodb/apps/user/migrationsgit commit -asm"Update model migrations"python manage.py migratepython manage.py migrate --database=bsr
The DB can be dumped to a file using the following:
python manage.py dumpdata --indent 4 uploader --exclude uploader.uploadedfile --output test_data.json
python manage.py prune_files [--dry_run]: Delete any and all orphaned data files.--dry_run: Output files to be deleted but don't actually delete anything.
python manage.py update_sql_views <view>: Create/update the custom SQL view provided and its view dependencies if any.--drop_only: Drop SQL view (and dependencies) but don't re-create.
python manage.py run_qc_annotators [--no_reruns]: Run all Quality Control annotators on the ArrayData database table.--no_reruns: Don't run annotators on existing annotations, leave computed values as is.
python manage.py get_column_names [--exlcude_observables] [--exclude_non_observables] [--center=<name|id>]--exlcude_observables: Only output column names for all observables currently in the database.--exclude_non_observables: Only output column names for non-observables.--center=<name|id>: Filter observables by center name or center ID.--category=<category>: Filter observables by category.--descriptions: Also print field.help_text and observation.description.--include_instrument_fields: Also include Instrument fields. Note: These are not used for bulk uploads, only the database Instrument ID is used. Therefore these aren't that useful to list. Does nothing when used with --exclude_non_observables.
python manage.py send_test_email <send_to_email_address>: Send a test email to "send_to_email_address" to test email setup.python manage.py makesuperuser: This is a wrapper of Django's builtincreatesuperusercommand except that it doesn't fail when the user already exists.- See
python manage.py createsuperuser --helpfor options. --fail: Revert tocreatesuperuserbehavior and fail when the user already exists.
- See
_NOTE: These commands must be run from the /app/ directory on the server.
The above management commands (and others) can be run in production from an EC2 instance correctly configured. To aid in shell setup, the following script can be executed:
source repo/biodb/scripts/prd/ec2_init.sh
_NOTE: scripts/prd/ec2_init.sh will export all AWS secrets as shell environment variables.
catalog/: Access cataloged datasets to download and explore for research purposes.data/: Data ingestion and editing. The following paths are the principal ingestion methods:data/uploader/patient/add/: Add all data associated with a given patient, new or existing.data/uploader/uploadedfile/add/: Bulk upload data in a tabulated format, e.g.,.csv,.xlsx, &.json.
explorer/: SQL UI interface for direct data exploration. Privileged user permissions required.admin/: Access to patient data, explorer queries, center info, user info, and general database admin. Privileged user permissions required. Note that the data input forms are simpler (not nested) than those atdata/.healthz/: Display simple health check system status for both the web application and backend infrastructure.
Entries in the ArrayData table can be annotated by running QCAnnotators on them, thus producing a value
stored as an ACAnnotation associated with the ArrayData entry. The ArrayData table contains the actual
array data values. It may be desirable to annotate this data with
certain quality control metrics that can later be used to filter the data. Such quality control functions are to be
implemented as a subclass of biodb.app.uploader.qc.qcfilter.QcFilter.
They can then be added to the database belonging to the QCAnnotator
table. Annotations of this annotators can then either be manually associated with a ArrayData entry manually via
the admin form, or by "default" if the QCAnnotator.default = True. They can also be run by using the
run_qc_annotator Django management command. The behavior for running these annotators and
population of the QCAnnotation table is configurable and is described below.
The following QC annotator settings are available in biodb.settings.base:
AUTO_ANNOTATE: IfTrueand if default annotators exist in the DB, they will be automatically run upon adding/updatingArrayDataentries. (Default: True)RUN_DEFAULT_ANNOTATORS_WHEN_SAVED: IfTrueandArrayDataentries exist in the DB, newly added/updated default annotators will be run on allArrayDataentries. (Default: False)
Running python manage.py run_qc_annotors will run all existing default annotators and re-run existing annotations on
all relevant ArrayData table entries. The option --no_reruns can be used to prevent re-running existing
annotations and only run default annotators that have not yet been run.
NOTE: If AUTO_ANNOTATE = RUN_DEFAULT_ANNOTATORS_WHEN_SAVED = False using the run_qc_annotors management command
is the only mechanism for creating quality control annotations.
NOTE: The following steps require pip install -r requirements/dev.txt.
Facilitates in testing typos, syntax, style, and other simple code analysis tests.
cdinto repo dir.- Switch/activate correct environment:
conda activate <environment_name> - Run
ruff . - This can be automatically run (recommended for devs) every time you
git pushby installing the providedpre-pushgit hook available in./githooks. Instructions are in that file - justcp ./githooks/pre-push .git/hooks/;chmod +x .git/hooks/pre-push.
Facilitates in checking for security concerns using Bandit.
cdinto repo dir.bandit -c pyproject.toml --severity-level=medium -r biodb
Facilitates in testing core package functionality at a modular level.
cdinto repo dir.- Run all available tests:
pytest . - Run specific test:
pytest tests/test_util.py::test_base_dummy.
Facilitates in testing whether core data results differ during development.
- WIP
Facilitates in testing at the application and infrastructure level.
- WIP
Facilitates in building, testing & viewing the docs.
cdinto repo dir.pip install -r requirements/docs.txtcd docsmake cleanmake html- To view the docs in your default browser run
open docs/_build/html/index.html.
When constructing bulk data files to be uploaded, the server must parse matching column names with model fields. To
obtain a list of these column names run python manage.py get_column_names.
See Custom Commands
