Merge pull request #79 from drpatelh/master

drpatelh · web-flow · commit da9653a0c543 · 2019-05-16T17:38:07.000+01:00
Bug fixes
diff --git a/README.md b/README.md
@@ -57,4 +57,12 @@ The nf-core/chipseq pipeline comes with documentation about the pipeline, found
 ## Credits
 These scripts were written for use at the [National Genomics Infrastructure](https://portal.scilifelab.se/genomics/)
 at [SciLifeLab](http://www.scilifelab.se/) in Stockholm, Sweden.
-Originally written by Chuan Wang ([@chuan-wang](https://github.com/chuan-wang)) and Phil Ewels ([@ewels](https://github.com/ewels)), and re-implemented by Harshil Patel ([@drpatelh](https://github.com/drpatelh)) from the [The Bioinformatics & Biostatistics Group](https://www.crick.ac.uk/research/science-technology-platforms/bioinformatics-and-biostatistics/) at [The Francis Crick Institute](https://www.crick.ac.uk/), London.
+Originally written by Chuan Wang ([@chuan-wang](https://github.com/chuan-wang)) and Phil Ewels ([@ewels](https://github.com/ewels)), and re-implemented by Harshil Patel ([@drpatelh](https://github.com/drpatelh)) from [The Bioinformatics & Biostatistics Group](https://www.crick.ac.uk/research/science-technology-platforms/bioinformatics-and-biostatistics/) at [The Francis Crick Institute](https://www.crick.ac.uk/), London.
+
+## Citation
+
+<!-- TODO nf-core: Add citation for pipeline after release. Uncomment lines below and add citation. -->
+<!-- If you use nf-core/chipseq for your analysis, please cite it as follows: -->
+
+You can cite the `nf-core` pre-print as follows:  
+Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. **nf-core: Community curated bioinformatics pipelines**. *bioRxiv*. 2019. p. 610741. [doi: 10.1101/610741](https://www.biorxiv.org/content/10.1101/610741v1).
diff --git a/assets/multiqc_config.yaml b/assets/multiqc_config.yaml
diff --git a/bin/check_design.py b/bin/check_design.py
@@ -79,16 +79,6 @@ def reformat_design(DesignFile,ReadMappingFile,ControlMappingFile):
                     print "{}: FastQ file has incorrect extension (has to be '.fastq.gz' or 'fq.gz') - {}\nLine: '{}'".format(ERROR_STR,fastq,line.strip())
                     sys.exit(1)
 
-                ## CHECK FASTQ FILES EXIST PER SAMPLE
-                if fastq[:4] not in ['http']:
-                    if not os.path.exists(fastq):
-                        print "{}: FastQ file does not exist - {}\nLine: '{}'".format(ERROR_STR,fastq,line.strip())
-                        sys.exit(1)
-                else:
-                    if requests.head(fastq).status_code >= 400:
-                        print "{}: FastQ file does not exist - {}\nLine: '{}'".format(ERROR_STR,fastq,line.strip())
-                        sys.exit(1)
-
             ## CREATE GROUP MAPPING DICT = {GROUP_ID: {REPLICATE_ID:[[FASTQ_FILES]]}
             if not sampleMappingDict.has_key(group):
                 sampleMappingDict[group] = {}
diff --git a/docs/output.md b/docs/output.md
diff --git a/docs/usage.md b/docs/usage.md
@@ -14,8 +14,9 @@
   * [`--design`](#--design)
 * [Generic arguments](#generic-arguments)
   * [`--singleEnd`](#--singleend)
-  * [`--narrowPeak`](#--narrowpeak)
+  * [`--seqCenter`](#--seqcenter)
   * [`--fragment_size`](#--fragment_size)
+  * [`--fingerprintBins`](#--fingerprintbins)
 * [Reference genomes](#reference-genomes)
   * [`--genome` (using iGenomes)](#--genome-using-igenomes)
   * [`--fasta`](#--fasta)
@@ -34,6 +35,11 @@
   * [`--keepDups`](#--keepdups)
   * [`--keepMultiMap`](#--keepmultimap)
   * [`--saveAlignedIntermediates`](#--savealignedintermediates)
+* [Peaks](#peaks)
+  * [`--narrowPeak`](#--narrowpeak)
+  * [`--broad_cutoff`](#--broad_cutoff)
+  * [`--saveMACSPileup`](#--savemacspileup)
+  * [`--skipDiffAnalysis`](#--skipdiffanalysis)
 * [Job resources](#job-resources)
   * [Automatic resubmission](#automatic-resubmission)
   * [Custom resource requests](#custom-resource-requests)
@@ -184,12 +190,17 @@ By default, the pipeline expects paired-end data. If you have single-end data, s
 
 It is not possible to run a mixture of single-end and paired-end files in one run.
 
-### `--narrowPeak`
-MACS2 is run by default with the [`--broad`](https://github.com/taoliu/MACS#--broad) flag. Specify this flag to call peaks in narrowPeak mode.
+### `--seqCenter`
+Sequencing center information that will be added to read groups in BAM files.
 
 ### `--fragment_size`
 Number of base pairs to extend single-end reads when creating bigWig files. Default: `0`
 
+### `--fingerprintBins`
+Number of genomic bins to use when generating the deepTools fingerprint plot. Larger numbers will give a smoother profile, but take longer to run.
+
+Default: `500000`
+
 ## Reference genomes
 
 The pipeline config files come bundled with paths to the illumina iGenomes reference index files. If running with docker or AWS, the configuration is set up to use the [AWS-iGenomes](https://ewels.github.io/AWS-iGenomes/) resource.
@@ -310,6 +321,20 @@ Reads mapping to multiple locations in the genome are not filtered from alignmen
 ### `--saveAlignedIntermediates`
 By default, intermediate BAM files will not be saved. The final BAM files created after the appropriate filtering step are always saved to limit storage usage. Set to true to also save other intermediate BAM files.
 
+## Peaks
+
+### `--narrowPeak`
+MACS2 is run by default with the [`--broad`](https://github.com/taoliu/MACS#--broad) flag. Specify this flag to call peaks in narrowPeak mode.
+
+### `--broad_cutoff`
+Specifies broad cutoff value for MACS2. Only used when --narrowPeak isnt specified. Default: 0.1
+
+### `--saveMACSPileup`
+Instruct MACS2 to create bedGraph files using the `-B --SPMR` parameters.
+
+### `--skipDiffAnalysis`
+Skip read counting and differential analysis step.
+
 ## Job resources
 ### Automatic resubmission
 Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with an error code of `143` (exceeded requested resources) it will automatically resubmit with higher requests (2 x original, then 3 x original). If it still fails after three times then the pipeline is stopped.
diff --git a/main.nf b/main.nf
@@ -26,13 +26,13 @@ def helpMessage() {
                                     Available: conda, docker, singularity, awsbatch, test
 
     Generic
-      --genome                      Name of iGenomes reference
       --singleEnd                   Specifies that the input is single-end reads
       --seqCenter                   Sequencing center information to be added to read group of BAM files
       --fragment_size [int]         Estimated fragment size used to extend single-end reads. Default: 0
       --fingerprintBins             Number of genomic bins to use when calculating fingerprint plot. Default: 500000
 
     References                      If not specified in the configuration file or you wish to overwrite any of the references
+      --genome                      Name of iGenomes reference
       --bwa_index                   Full path to directory containing BWA index including base name i.e. /path/to/index/genome.fa
       --gene_bed                    Path to BED file containing gene intervals
       --tss_bed                     Path to BED file containing transcription start sites
@@ -320,7 +320,7 @@ if (params.singleEnd) {
 if (!params.bwa_index){
     process makeBWAindex {
         tag "$fasta"
-        label 'process_big'
+        label 'process_high'
         publishDir path: { params.saveGenomeIndex ? "${params.outdir}/reference_genome" : params.outdir },
                    saveAs: { params.saveGenomeIndex ? it : null }, mode: 'copy'
 
@@ -515,7 +515,7 @@ if (params.skipTrimming){
  */
 process bwaMEM {
     tag "$name"
-    label 'process_big'
+    label 'process_high'
 
     input:
     set val(name), file(reads) from ch_trimmed_reads
@@ -861,7 +861,7 @@ process bigWig {
  */
 process plotProfile {
     tag "$name"
-    label 'process_big'
+    label 'process_high'
     publishDir "${params.outdir}/bwa/mergedLibrary/deepTools/plotProfile", mode: 'copy'
 
     input:
@@ -1062,7 +1062,7 @@ process peakQC {
  */
 process plotFingerprint {
     tag "${ip} vs ${control}"
-    label 'process_big'
+    label 'process_high'
     publishDir "${params.outdir}/bwa/mergedLibrary/deepTools/plotFingerprint", mode: 'copy'
 
     input:
diff --git a/nextflow.config b/nextflow.config
@@ -9,13 +9,13 @@
 params {
 
   // Options: Generic
-  genome = false
   singleEnd = false
   seqCenter = false
   fragment_size = 0
   fingerprintBins = 500000
 
   // Options: References
+  genome = false
   tss_bed = false
   saveGenomeIndex = false