Bioinformatics pipeline for HPV detection in RNA sequencing specimens. “HPV-meta" performs several steps including quality trimming (trimmomatic and cutadapt), human genome filtering (nextgenmap), samtools, HPV detection (diamond), cut-off settlement, coverage calculation, variant calling and fasta generation.
Flowchart
Contents: The content of the repository is as follows:
- src - contains the source Jupyter notebooks files for each bioinformatic program
- settings - consists configuration file for arguments to the program in YAML format
- jobs_config - Hopsworks Jobs configuration JSONs
- airflow_pipeline - Python script for the running the pipeline as Airflow DAG
- postprocessing: phyton scripts for cutt-off settlement, coverage analysis, variant calling and fasta generation
To setup the pipeline we first need a running Hospworks cluster. To know more about open-source version of Hopsworks and installation check the github repo or visit the official documentation
Once the Hopsworks cluster is installed, the pipeline can be setup in below steps:
- Create datasets for output and input. Upload the data into the input dataset
- Clone this repo into the Hopsworks project or upload the source code
- Modify the settings.yml
- Create jobs. You may use the jobs_config JSON files to import the job configs
- Upload the airflow pipeline python script. This contains the Airflow DAG. You have the modify project name, user name and provide a unique DAG name
When using the tool in published research, please cite: Ure A, Mukhedkar D, Arroyo Mühr LS. Using HPV-meta for human papillomavirus RNA quality detection. Sci Rep. 2022 Jul 29;12(1):13058. doi: 10.1038/s41598-022-17318-5. PMID: 35906372; PMCID: PMC9338075. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9338075/
