nf-core
diff --git a/‎CHANGELOG.md‎
Lines changed: 18 additions & 21 deletions b/‎CHANGELOG.md‎
Lines changed: 18 additions & 21 deletions
diff --git a/‎README.md‎
Lines changed: 12 additions & 9 deletions b/‎README.md‎
Lines changed: 12 additions & 9 deletions
diff --git a/‎assets/multiqc/deseq2_clustering_header.txt‎
Lines changed: 2 additions & 2 deletions b/‎assets/multiqc/deseq2_clustering_header.txt‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎assets/multiqc/deseq2_pca_header.txt‎
Lines changed: 2 additions & 2 deletions b/‎assets/multiqc/deseq2_pca_header.txt‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎assets/multiqc/multiqc_config.yaml‎
Lines changed: 45 additions & 6 deletions b/‎assets/multiqc/multiqc_config.yaml‎
Lines changed: 45 additions & 6 deletions
diff --git a/‎bin/igv_files_to_session.py‎
Lines changed: 9 additions & 4 deletions b/‎bin/igv_files_to_session.py‎
Lines changed: 9 additions & 4 deletions
diff --git a/‎bin/igv_get_files.sh‎
Lines changed: 0 additions & 23 deletions b/‎bin/igv_get_files.sh‎
Lines changed: 0 additions & 23 deletions
@@ -1,25 +1,22 @@
 # nf-core/chipseq: Changelog
 
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
+and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
+
 ## v1.0dev - [date]
-* Add option for building BWA index for larger ref
-* Update software versions in uppmax-modules
-* Add function to validate input files in MACS config file
-* Remove ngsplot and move the function of plotProfile in deepTools
-* Support reference genome GRCm37
-* Add pre-defined genome sizes for all reference genomes to support macs2 peak calling and downstream processing
-* Add blacklist files for ce11, BDGP6, hg38, and mm9
-* Documents revised accordingly.
-* Major overhaul of docs and assets in-line with nf-core/tools v1.4
-* Added ability to use nf-core/configs along with associated docs
-* Updated manifest scope to deal with pipeline version
-* Removed NGI and SciLifeLab logos, and changed name of pipeline logo
-* Added awsbatch configuration
-* Put file() calls in fromFilePath()
-* Removed --project param specific to UPPMAX
-* Moved appropriate default params variables to nextflow.config
-* Changed Picard memory specification
-* Changed version number back to 1.0dev from 1.0
-* Updated conda packages
-* Major template changes in-line with nf-core/tools v1.5
+Initial release of nf-core/chipseq.
+
+### `Added`
 
-Repository moved from <https://github.com/SciLifeLab/NGI-ChIPseq>
+* Raw read QC (FastQC)
+* Adapter trimming (Trim Galore!)
+* Map and filter reads (BWA, picard, SAMtools, BEDTools, BAMTools, Pysam)
+* Create library-size normalised bigWig tracks (BEDTools, bedGraphToBigWig)
+* ChIP-seq QC metrics (deepTools, phantompeakqualtools)
+* Call and annotate broad/narrow peaks (MACS2, HOMER)
+* Create consensus set of peaks per antibody (BEDTools)
+* Quantification and differential binding analysis (featureCounts, DESeq2)
+* Collate appropriate files for genome browser visualisation (IGV)
+* Collate and present various QC metrics (MultiQC, R)
@@ -30,15 +30,16 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool
         * reads that map to different chromosomes ([`Pysam`](http://pysam.readthedocs.io/en/latest/installation.html); *paired-end only*)
         * reads that arent in FR orientation ([`Pysam`](http://pysam.readthedocs.io/en/latest/installation.html); *paired-end only*)
         * reads where only one read of the pair fails the above criteria ([`Pysam`](http://pysam.readthedocs.io/en/latest/installation.html); *paired-end only*)
-    3. Create normalised bigWig files scaled to 1 million mapped reads ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`wigToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/))
-    4. Generate gene-body meta-profile from bigWig files ([`deepTools`](https://deeptools.readthedocs.io/en/develop/content/tools/plotProfile.html))
-    5. Calculate genome-wide IP enrichment relative to control ([`deepTools`](https://deeptools.readthedocs.io/en/develop/content/tools/plotFingerprint.html))
-    6. Calculate strand cross-correlation peak and ChIP-seq quality measures including NSC and RSC ([`phantompeakqualtools`](https://github.com/kundajelab/phantompeakqualtools))
-    7. Call broad/narrow peaks ([`MACS2`](https://github.com/taoliu/MACS))
-    8. Annotate peaks relative to gene features ([`HOMER`](http://homer.ucsd.edu/homer/download.html))
-    9. Create consensus peakset across all samples and create tabular file to aid in the filtering of the data ([`BEDTools`](https://github.com/arq5x/bedtools2/))
-    10. Count reads in consensus peaks ([`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/))
-    11. Differential binding analysis, PCA and clustering ([`R`](https://www.r-project.org/), [`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html))
+    3. Alignment-level QC and estimation of library complexity ([`picard`](https://broadinstitute.github.io/picard/), [`Preseq`](http://smithlabresearch.org/software/preseq/))
+    4. Create normalised bigWig files scaled to 1 million mapped reads ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/))
+    5. Generate gene-body meta-profile from bigWig files ([`deepTools`](https://deeptools.readthedocs.io/en/develop/content/tools/plotProfile.html))
+    6. Calculate genome-wide IP enrichment relative to control ([`deepTools`](https://deeptools.readthedocs.io/en/develop/content/tools/plotFingerprint.html))
+    7. Calculate strand cross-correlation peak and ChIP-seq quality measures including NSC and RSC ([`phantompeakqualtools`](https://github.com/kundajelab/phantompeakqualtools))
+    8. Call broad/narrow peaks ([`MACS2`](https://github.com/taoliu/MACS))
+    9. Annotate peaks relative to gene features ([`HOMER`](http://homer.ucsd.edu/homer/download.html))
+    10. Create consensus peakset across all samples and create tabular file to aid in the filtering of the data ([`BEDTools`](https://github.com/arq5x/bedtools2/))
+    11. Count reads in consensus peaks ([`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/))
+    12. Differential binding analysis, PCA and clustering ([`R`](https://www.r-project.org/), [`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html))
 6. Create IGV session file containing bigWig tracks, peaks and differential sites for data visualisation ([`IGV`](https://software.broadinstitute.org/software/igv/)).
 7. Present QC for raw read, alignment, peak-calling and differential binding results ([`MultiQC`](http://multiqc.info/), [`R`](https://www.r-project.org/))
 
@@ -57,6 +58,8 @@ The nf-core/chipseq pipeline comes with documentation about the pipeline, found
 ## Credits
 These scripts were orginally written by Chuan Wang ([@chuan-wang](https://github.com/chuan-wang)) and Phil Ewels ([@ewels](https://github.com/ewels)) for use at the [National Genomics Infrastructure](https://portal.scilifelab.se/genomics/) at [SciLifeLab](http://www.scilifelab.se/) in Stockholm, Sweden. It has since been re-implemented by Harshil Patel ([@drpatelh](https://github.com/drpatelh)) from [The Bioinformatics & Biostatistics Group](https://www.crick.ac.uk/research/science-technology-platforms/bioinformatics-and-biostatistics/) at [The Francis Crick Institute](https://www.crick.ac.uk/), London.
 
+Many thanks to others who have helped out along the way too, including (but not limited to): [@apeltzer](https://github.com/apeltzer), [@bc2zb](https://github.com/bc2zb), [@drejom](https://github.com/drejom), [@KevinMenden](https://github.com/KevinMenden), [@pditommaso](https://github.com/pditommaso).
+
 ## Citation
 
 <!-- TODO nf-core: Add citation for pipeline after release. Uncomment lines below and add citation. -->
 
@@ -1,9 +1,9 @@
 #id: 'deseq2_clustering'
 #section_name: 'DESeq2: Sample similarity'
-#description: "is generated from clustering by Euclidean distances between
+#description: " matrix is generated from clustering by Euclidean distances between
 #	       <a href='https://bioconductor.org/packages/release/bioc/html/DESeq2.html' target='_blank'>DESeq2</a>
 #              rlog values for each sample
-#              in the <a href='https://github.com/nf-core/atacseq/blob/master/bin/featurecounts_deseq2.r'><code>featurecounts_deseq2.r</code></a> script."
+#              (see <a href='https://github.com/nf-core/chipseq/blob/master/bin/featurecounts_deseq2.r'><code>featurecounts_deseq2.r</code></a> script)."
 #plot_type: 'heatmap'
 #anchor: 'nfcore_chipseq-deseq2_clustering'
 #pconfig:
 
@@ -1,8 +1,8 @@
 #id: 'deseq2_pca'
 #section_name: 'DESeq2: PCA plot'
-#description: "PCA plot between samples in the experiment.
+#description: "between samples in the experiment.
 #              These values are calculated using <a href='https://bioconductor.org/packages/release/bioc/html/DESeq2.html'>DESeq2</a>
-#              in the <a href='https://github.com/nf-core/atacseq/blob/master/bin/featurecounts_deseq2.r'><code>featurecounts_deseq2.r</code></a> script."
+#              in the <a href='https://github.com/nf-core/chipseq/blob/master/bin/featurecounts_deseq2.r'><code>featurecounts_deseq2.r</code></a> script."
 #plot_type: 'scatter'
 #anchor: 'nfcore_chipseq-deseq2_pca'
 #pconfig:
 
@@ -3,8 +3,6 @@ report_comment: >
     analysis pipeline. For information about how to interpret these results, please see the
     <a href="https://github.com/nf-core/chipseq/blob/master/docs/output.md" target="_blank">documentation</a>.
 
-skip_generalstats: true
-
 export_plots: true
 
 fn_clean_exts:
@@ -49,6 +47,11 @@ module_order:
         info: 'This section of the report shows SAMTools results after merging libraries and before filtering.'
         path_filters:
             - '*mLb.mkD.sorted.bam*'
+    - preseq:
+        name: 'Preseq (merged library; unfiltered)'
+        info: 'This section of the report shows Preseq results after merging libraries and before filtering.'
+        path_filters:
+            - '*mLb*'
     - samtools:
         name: 'SAMTools (merged library; filtered)'
         info: 'This section of the report shows SAMTools results after merging libraries and after filtering.'
@@ -83,14 +86,50 @@ report_section_order:
         order: -1400
     peak_annotation:
         order: -1500
-    deseq2_pca:
+    deseq2_pca_1:
         order: -1600
-    deseq2_clustering:
+    deseq2_pca_2:
         order: -1700
-    software_versions:
+    deseq2_pca_3:
         order: -1800
-    nf-core-chipseq-summary:
+    deseq2_pca_4:
         order: -1900
+    deseq2_pca_5:
+        order: -2000
+    deseq2_pca_6:
+        order: -2100
+    deseq2_pca_7:
+        order: -2200
+    deseq2_pca_8:
+        order: -2300
+    deseq2_pca_9:
+        order: -2400
+    deseq2_pca_10:
+        order: -2500
+    deseq2_clustering_1:
+        order: -2600
+    deseq2_clustering_2:
+        order: -2700
+    deseq2_clustering_3:
+        order: -2800
+    deseq2_clustering_4:
+        order: -2900
+    deseq2_clustering_5:
+        order: -3000
+    deseq2_clustering_6:
+        order: -3100
+    deseq2_clustering_7:
+        order: -3200
+    deseq2_clustering_8:
+        order: -3300
+    deseq2_clustering_9:
+        order: -3400
+    deseq2_clustering_10:
+        order: -3500
+    software_versions:
+        order: -3600
+    nf-core-chipseq-summary:
+        order: -3700
 
 custom_plot_config:
     picard-insertsize:
 
@@ -24,6 +24,9 @@
 argParser.add_argument('XML_OUT', help="XML output file.")
 argParser.add_argument('LIST_FILE', help="Tab-delimited file containing two columns i.e. file_name\tcolour. Header isnt required.")
 argParser.add_argument('GENOME', help="Full path to genome fasta file or shorthand for genome available in IGV e.g. hg19.")
+
+## OPTIONAL PARAMETERS
+argParser.add_argument('-pp', '--path_prefix', type=str, dest="PATH_PREFIX", default='', help="Path prefix to be added at beginning of all files in input list file.")
 args = argParser.parse_args()
 
 ############################################
@@ -47,7 +50,7 @@ def makedir(path):
 ############################################
 ############################################
 
-def igv_files_to_session(XMLOut,ListFile,Genome):
+def igv_files_to_session(XMLOut,ListFile,Genome,PathPrefix=''):
 
     makedir(os.path.dirname(XMLOut))
 
@@ -57,7 +60,9 @@ def igv_files_to_session(XMLOut,ListFile,Genome):
         line = fin.readline()
         if line:
             ifile,colour = line.strip().split('\t')
-            fileList.append((ifile,colour))
+            if len(colour.strip()) == 0:
+                colour = '0,0,178'
+            fileList.append((PathPrefix.strip()+ifile,colour))
         else:
             break
             fout.close()
@@ -74,7 +79,7 @@ def igv_files_to_session(XMLOut,ListFile,Genome):
     XMLStr += '\t<Panel height="1160" name="DataPanel" width="1897">\n'
     for ifile,colour in fileList:
         extension = os.path.splitext(ifile)[1].lower()
-        if extension in ['.bed']:
+        if extension in ['.bed','.broadpeak','.narrowpeak']:
             XMLStr += '\t\t<Track altColor="0,0,178" autoScale="false" clazz="org.broad.igv.track.FeatureTrack" color="%s" ' % (colour)
             XMLStr += 'displayMode="SQUISHED" featureVisibilityWindow="-1" fontSize="10" height="20" '
             XMLStr += 'id="%s" name="%s" renderer="BASIC_FEATURE" sortable="false" visible="true" windowFunction="count"/>\n' % (ifile,os.path.basename(ifile))
@@ -108,7 +113,7 @@ def igv_files_to_session(XMLOut,ListFile,Genome):
 ############################################
 ############################################
 
-igv_files_to_session(XMLOut=args.XML_OUT,ListFile=args.LIST_FILE,Genome=args.GENOME)
+igv_files_to_session(XMLOut=args.XML_OUT,ListFile=args.LIST_FILE,Genome=args.GENOME,PathPrefix=args.PATH_PREFIX)
 
 ############################################
 ############################################