-
Notifications
You must be signed in to change notification settings - Fork 0
1. PTCP on HPC
The PacBio PureTarget Carrier Pipeline is a WDL-based workflow for genotyping tandem repeat regions and homologous genes with segmental duplications from PacBio PureTarget HiFi data. It uses a containerized toolchain for reproducibility on workstations and HPC clusters. By cloning this repository and setting up a virtual environment you can install PTCP and run it on your HPC.
- Install requirements in a virtual environment
- Set up the image of PTCP dependencies
- Gather input information
- Running PTCP
Before you begin, make sure you have the following prerequisites available on your HPC or login node:
- Conda/Mamba (recommended on many HPCs) or
venvfor managing a Python environment - Python 3.12 (or compatible with
miniwdl) - Either Docker or Singularity/Apptainer (choose one based on your environment)
- Optional: SLURM scheduler if you plan to use the
miniwdl-slurmbackend
Below are example commands used to install the requirements needed to run PTCP. The examples show conda/mamba plus either pip (most universal) or uv (faster). Pick one approach based on what your HPC supports.
Tip: If you already have an environment named
ptcpand want a clean rebuild, remove it first withconda env remove -n ptcp(ormamba env remove -n ptcp).
git clone https://github.com/PacificBiosciences/ptcp.git
cd ptcp/
conda create -n ptcp python=3.12.11
conda activate ptcp
python -m pip install -r requirements.txt
which miniwdl
miniwdl --versionThis option installs packages into the active conda environment, but uses uv as the installer/resolver.
git clone https://github.com/PacificBiosciences/ptcp.git
cd ptcp/
mamba create -n ptcp python=3.12.11
conda activate ptcp
python -m pip install uv
uv pip install --system -r requirements.txt
which miniwdl
miniwdl --versionIf your HPC does not support conda/mamba, you can use a standard virtualenv instead.
git clone https://github.com/PacificBiosciences/ptcp.git
cd ptcp/
uv venv --python 3.12.11
source .venv/bin/activate
uv pip install -r requirements.txt
which miniwdl
miniwdl --versionPTCP is a WDL workflow that relies on many software packages. These packages are containerized in an image for PTCP to use. Core tools inside the container: TRGT, Paraphase, Sawfish, ptcp-qc, SMRT Link
It can be invoked as a Docker container or a Singularity image file. Instructions for setting up both are shown below.
A pre-built Docker image of these dependencies can be found on quay.io at: quay.io/pacbio/ptcp:3.2. Below is an example docker command used to pull the image of PTCP dependencies for version 3.2:
docker pull quay.io/pacbio/ptcp:3.2You can build a .sif of PTCP dependencies for the version of PTCP you are installing by using apptainer (or singularity) to build the .sif from the Docker image hosted on quay.io. An example command is shown below:
apptainer pull ptcp_3.2.sif docker://quay.io/pacbio/ptcp:3.2Once you have built the .sif file you should move it to the ptcp GitHub repo folder, like so:
cd /Path/to/ptcp
mkdir miniwdl_singularity_cache
cd miniwdl_singularity_cache
mv /Path/to/Downloads/ptcp_3.2.sif .PTCP requires several inputs, all summarized in an input JSON file. For detailed information on each of the input files required, please refer to the Input Files page. This section will briefly cover how to create the sample-sheet.csv and the input JSON file used to run the pipeline.
The sample-sheet.csv contains at least two comma-separated columns: bam_id and sex. A legacy third column (bam_name) is also supported. For more information on how to format this file, please refer to the Input Files page. You will need to create this file with one row per sample before you can create the input JSON in the next section.
The input JSON contains all of the input information required by PTCP. Because the input JSON contains sample-specific information, it needs to be generated for each run of PureTarget Carrier Panel data you want to analyze. This repository contains a script called create_input_json.py to make it easier to generate the input JSON with the sample and other input information. This script requires a template JSON be passed to it with certain fields filled in.
An example template of the input JSON is provided in this repository called inputs_json_template.json. It is recommended that you copy this template and update all of the fields with preset example paths to point to their file locations on your server. For instance, in the example below, you should update all fields except for ptcp.sample_sheet, ptcp.hifi_reads, and ptcp.fail_reads:
{
"ptcp.sample_sheet": "",
"ptcp.ref_fasta": "/path/to/reference/hg38.fa",
"ptcp.ref_index": "/path/to/reference/hg38.fa.fai",
"ptcp.trgt_bed": "/path/to/ptcp/meta/trgt/PureTarget_repeat_expansion_panel_2.0.repeat_definition.GRCh38.bed",
"ptcp.paraphase_config_yaml": "/path/to/ptcp/meta/paraphase/paraphase_config.GRCh38.yaml",
"ptcp.paraphase_annotation_vcf": "/path/to/ptcp/meta/variant_list/variant_list.GRCh38.vcf",
"ptcp.genome_version": "38",
"ptcp.ptcp_qc_bed": "/path/to/ptcp/meta/ptcp-qc/ptcp-qc.GRCh38.bed",
"ptcp.hifi_reads": [],
"ptcp.fail_reads": []
}Optional inputs you may add to (or remove from) the template:
-
ptcp.paraphase_annotation_vcf(included in the repository template; omit this key to disablehavanno_json) -
ptcp.pt_linear_regression(enables SMN homology correction) -
ptcp.docker_smrttools(defaults toquay.io/pacbio/ptcp:X.Y)
Once you have your template updated, you can use it to create the input JSON with the create_input_json.py script like so:
python docker/ptcp/scripts/create_input_json.py \
--data /Path/to/demux/PacBio/Run \
--sample_sheet /Path/to/sample-sheet.csv \
--template /Path/to/template.json \
> ptcp_inputs.jsonPTCP also requires a MiniWDL configuration file as input. This file defines important runtime settings such as image cache, task concurrency, container backends, and more. You must update the configuration file to match your local system or execution environment.
Make sure to review and customize the configuration according to your hardware, software paths, and environment preferences before running the workflow.
Example configuration file You can find an example MiniWDL configuration file here at tests/miniwdl.cfg.
For more information about configuring MiniWDL, refer to the official MiniWDL documentation.
Once you have your environment configured, the image of dependencies installed, and your inputs gathered you can run PTCP locally. Below is an example command used to run the pipeline:
conda activate ptcp
cd /Path/to/ptcp
miniwdl run \
--verbose \
--dir /Path/to/output_dir \
--cfg /Path/to/miniwdl.cfg \
--input /Path/to/ptcp_inputs.json \
main.wdlOutputs and logs will appear under /Path/to/output_dir in a timestamped run directory.