INSPIIRED optimization with NextFlow

This Nextflow pipeline is based in INSPIIRED software's intSiteCaller module. It follows the same general workflow as the original pipeline, with several enhancements:

Accepts BCL or FASTQ inputs
Faster alignment and overall execution time
Support for multiple samples
An alternative insertion site detection strategy that counts each exact insertion once and enables identification of clonal expansions

Workflow diagram

Pre-requirements

In order to run this pipeline, some prerequisites must be met:

--samplesheet parameter must be provided, which will be the path to the samplesheet. The samplesheet must contain the following columns:
- Sample_ID: Sample ID
- index: Index sequence (sample unique linker)
- ìndex2: Second index sequence (Golay Sequence)
- common_linker: Common linker sequence
- primer: Primer sequence
- ltrbit: LTR bit sequence
- largeLTRFrag: Large LTR fragment sequence
- Sample_Project: Project name
- mingDNA: Minimum DNA length
- minPctIdent: Minimum percentage of identity
- maxAlignStart: Maximum alignment start
- maxFragLength: Maximum fragment length
- refGenome: Reference genome name
- vectorSeq: Vector sequence path

--instrument parameter must be provided, which will be the sequencing machine used.

If the input is a BCL Run Folder:
- --runfolderDir parameter must be provided, which will be the path to the BCL Run Folder. This folder must be in the same directory as the pipeline.
If the input is a FASTQ file:
- --runfolderDir parameter must be provided, which will be any folder in the same directory as the pipeline. It will be used as a reference path
- --FASTQfolderDirparameter must be provided, which will be the path to the folder with Undetermined FASTQ files
- --readStructureparameter must be provided, which will be the structure of template and barcode sequences. If a read has 34 nucleotides and the barcodes are of 12 nucleotides separately, the read structure is 34T 12B. However, if the barcodes are inside the read the read structure would be 12B34T. In our case, we will mostly have the following read structure: 20B+T 12B +T
The container images that are described in the .def files must be created and available.
The fasta file of the vector's genomic sequence must be available in the same directory as the pipeline.
The fasta file of the reference genome must be available in the same directory as the pipeline, and its name must strt with the name of the genome (hg19, hg38...) and finish with the .faextension.

Running the pipeline

The pipeline can be run using the following command when running with a BCL input:

nextflow run main.nf \
    --BCLorFASTQ BCL \
    --runfolderDir /path/to/BCL/Run/Folder \
    --samplesheet /path/to/SampleSheet.csv \
    --projectName ProjectName \
    --readStructure '20B+T 12B +T' \
    -with-report reports/ProjectName_report.html \
    --instrument 'MiSeq' \
    -with-trace reports/ProjectName_trace.txt \
    -resume

When we have a FASTQ input, we can run the pipeline with the following command:

nextflow run main.nf \
    --BCLorFASTQ FASTQ \
    --runfolderDir /path/to/FASTQ/Run/Folder \
    --samplesheet /path/to/SampleSheet.csv \
    --FASTQfolderDir /path/to/Undetermined_FASTQ_Files \
    --readStructure '20B+T 12B +T' \
    --projectName ProjectName \
    -with-report reports/ProjectName_report.html \
    --instrument 'MiSeq' \
    -with-trace reports/ProjectName_trace.txt \
    -resume

Output

The output of the pipeline will be in the results folder. The folder structure will be as follows:

results
├── 00_normalized_index_length
│   ├── ProjectName (every step will have a folder with the project name)
│   │   ├── files
├── 1_demuxed
├── 2_extractedumi
├── 3_fastqcraw
├── 4_trimmedfastq_fastqc
├── 5_removed_n
├── 6_fastqctrimmed
├── 7_multiqcaftertrim
├── 8_LTR_presence
├── 9_reverse_complement_removal
├── 10_findvector
├── 11_short_remove
├── 12_genome_index
├── 13_alignment
├── 14_index_sort_bam
├── 15_allsites
├── 16_sitesfinal
├── 17_sitesfinal_to_points

LICENSE

This project is licensed under the GPLv3 License - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
bin		bin
containers		containers
modules		modules
subworkflows		subworkflows
LICENSE		LICENSE
Modules.md		Modules.md
README.md		README.md
Subworkflows.md		Subworkflows.md
Workflow_diagram.png		Workflow_diagram.png
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

INSPIIRED optimization with NextFlow

Workflow diagram

Pre-requirements

Running the pipeline

Output

LICENSE

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

INSPIIRED optimization with NextFlow

Workflow diagram

Pre-requirements

Running the pipeline

Output

LICENSE

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages