Name	Name	Last commit message	Last commit date
parent directory ..
build-configs	build-configs
defaults	defaults
rules	rules
scripts	scripts
README.md	README.md
Snakefile	Snakefile

Ingest

This workflow ingests public data from Pathoplexus and outputs curated metadata and sequences that can be used as input for the phylogenetic workflow.

If you have another data source or private data that needs to be formatted for the phylogenetic workflow, then you can use a similar workflow to curate your own data.

Workflow Usage

The workflow can be run from the top level pathogen repo directory:

nextstrain build ingest

Alternatively, the workflow can also be run from within the ingest directory:

cd ingest
nextstrain build .

This produces the default outputs of the ingest workflow:

metadata = results/metadata.tsv
sequences = results/sequences.fasta

Defaults

The defaults directory contains all of the default configurations for the ingest workflow.

defaults/config.yaml contains all of the default configuration parameters used for the ingest workflow. Use Snakemake's --configfile/--config options to override these default values.

Snakefile and rules

The rules directory contains separate Snakefiles (*.smk) as modules of the core ingest workflow. The modules of the workflow are in separate files to keep the main ingest Snakefile succinct and organized.

The workdir is hardcoded to be the ingest directory so all filepaths for inputs/outputs should be relative to the ingest directory.

Modules are all included in the main Snakefile in the order that they are expected to run.

Nextclade

Nextstrain is pushing to standardize ingest workflows with Nextclade runs to include Nextclade outputs in our publicly hosted data. However, if a Nextclade dataset does not already exist, it requires curated data as input, so we are making Nextclade steps optional here.

If Nextclade config values are included, the Nextclade rules will create the final metadata TSV by joining the Nextclade output with the metadata. If Nextclade configs are not included, we rename the subset metadata TSV to the final metadata TSV.

To run Nextclade rules, include the defaults/nextclade_config.yaml config file with:

nextstrain build ingest --configfile defaults/nextclade_config.yaml

Tip

If the Nextclade dataset is stable and you always want to run the Nextclade rules as part of ingest, we recommend moving the Nextclade related config parameters from the defaults/nextclade_config.yaml file to the default config file defaults/config.yaml.

Build configs

The build-configs directory contains custom configs and rules that override and/or extend the default workflow.

nextstrain-automation - automated internal Nextstrain builds

Uploads results files to our S3 bucket

nextstrain build ingest --configfile build-configs/nextstrain-automation/config.yaml -f upload_all

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Ingest

Workflow Usage

Defaults

Snakefile and rules

Nextclade

Build configs

nextstrain-automation - automated internal Nextstrain builds

FilesExpand file tree

ingest

Directory actions

More options

Directory actions

More options

Latest commit

History

ingest

Folders and files

parent directory

README.md

Ingest

Workflow Usage

Defaults

Snakefile and rules

Nextclade

Build configs

nextstrain-automation - automated internal Nextstrain builds