Skip to content

Commit 83dc9cb

Browse files
authored
Merge pull request #85 from drpatelh/master
Added skip QC options and other parameters
2 parents ead5edf + 195b366 commit 83dc9cb

15 files changed

+443
-256
lines changed

CHANGELOG.md

Lines changed: 18 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,22 @@
11
# nf-core/chipseq: Changelog
22

3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
6+
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
7+
38
## v1.0dev - [date]
4-
* Add option for building BWA index for larger ref
5-
* Update software versions in uppmax-modules
6-
* Add function to validate input files in MACS config file
7-
* Remove ngsplot and move the function of plotProfile in deepTools
8-
* Support reference genome GRCm37
9-
* Add pre-defined genome sizes for all reference genomes to support macs2 peak calling and downstream processing
10-
* Add blacklist files for ce11, BDGP6, hg38, and mm9
11-
* Documents revised accordingly.
12-
* Major overhaul of docs and assets in-line with nf-core/tools v1.4
13-
* Added ability to use nf-core/configs along with associated docs
14-
* Updated manifest scope to deal with pipeline version
15-
* Removed NGI and SciLifeLab logos, and changed name of pipeline logo
16-
* Added awsbatch configuration
17-
* Put file() calls in fromFilePath()
18-
* Removed --project param specific to UPPMAX
19-
* Moved appropriate default params variables to nextflow.config
20-
* Changed Picard memory specification
21-
* Changed version number back to 1.0dev from 1.0
22-
* Updated conda packages
23-
* Major template changes in-line with nf-core/tools v1.5
9+
Initial release of nf-core/chipseq.
10+
11+
### `Added`
2412

25-
Repository moved from <https://github.com/SciLifeLab/NGI-ChIPseq>
13+
* Raw read QC (FastQC)
14+
* Adapter trimming (Trim Galore!)
15+
* Map and filter reads (BWA, picard, SAMtools, BEDTools, BAMTools, Pysam)
16+
* Create library-size normalised bigWig tracks (BEDTools, bedGraphToBigWig)
17+
* ChIP-seq QC metrics (deepTools, phantompeakqualtools)
18+
* Call and annotate broad/narrow peaks (MACS2, HOMER)
19+
* Create consensus set of peaks per antibody (BEDTools)
20+
* Quantification and differential binding analysis (featureCounts, DESeq2)
21+
* Collate appropriate files for genome browser visualisation (IGV)
22+
* Collate and present various QC metrics (MultiQC, R)

README.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -30,15 +30,16 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool
3030
* reads that map to different chromosomes ([`Pysam`](http://pysam.readthedocs.io/en/latest/installation.html); *paired-end only*)
3131
* reads that arent in FR orientation ([`Pysam`](http://pysam.readthedocs.io/en/latest/installation.html); *paired-end only*)
3232
* reads where only one read of the pair fails the above criteria ([`Pysam`](http://pysam.readthedocs.io/en/latest/installation.html); *paired-end only*)
33-
3. Create normalised bigWig files scaled to 1 million mapped reads ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`wigToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/))
34-
4. Generate gene-body meta-profile from bigWig files ([`deepTools`](https://deeptools.readthedocs.io/en/develop/content/tools/plotProfile.html))
35-
5. Calculate genome-wide IP enrichment relative to control ([`deepTools`](https://deeptools.readthedocs.io/en/develop/content/tools/plotFingerprint.html))
36-
6. Calculate strand cross-correlation peak and ChIP-seq quality measures including NSC and RSC ([`phantompeakqualtools`](https://github.com/kundajelab/phantompeakqualtools))
37-
7. Call broad/narrow peaks ([`MACS2`](https://github.com/taoliu/MACS))
38-
8. Annotate peaks relative to gene features ([`HOMER`](http://homer.ucsd.edu/homer/download.html))
39-
9. Create consensus peakset across all samples and create tabular file to aid in the filtering of the data ([`BEDTools`](https://github.com/arq5x/bedtools2/))
40-
10. Count reads in consensus peaks ([`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/))
41-
11. Differential binding analysis, PCA and clustering ([`R`](https://www.r-project.org/), [`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html))
33+
3. Alignment-level QC and estimation of library complexity ([`picard`](https://broadinstitute.github.io/picard/), [`Preseq`](http://smithlabresearch.org/software/preseq/))
34+
4. Create normalised bigWig files scaled to 1 million mapped reads ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/))
35+
5. Generate gene-body meta-profile from bigWig files ([`deepTools`](https://deeptools.readthedocs.io/en/develop/content/tools/plotProfile.html))
36+
6. Calculate genome-wide IP enrichment relative to control ([`deepTools`](https://deeptools.readthedocs.io/en/develop/content/tools/plotFingerprint.html))
37+
7. Calculate strand cross-correlation peak and ChIP-seq quality measures including NSC and RSC ([`phantompeakqualtools`](https://github.com/kundajelab/phantompeakqualtools))
38+
8. Call broad/narrow peaks ([`MACS2`](https://github.com/taoliu/MACS))
39+
9. Annotate peaks relative to gene features ([`HOMER`](http://homer.ucsd.edu/homer/download.html))
40+
10. Create consensus peakset across all samples and create tabular file to aid in the filtering of the data ([`BEDTools`](https://github.com/arq5x/bedtools2/))
41+
11. Count reads in consensus peaks ([`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/))
42+
12. Differential binding analysis, PCA and clustering ([`R`](https://www.r-project.org/), [`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html))
4243
6. Create IGV session file containing bigWig tracks, peaks and differential sites for data visualisation ([`IGV`](https://software.broadinstitute.org/software/igv/)).
4344
7. Present QC for raw read, alignment, peak-calling and differential binding results ([`MultiQC`](http://multiqc.info/), [`R`](https://www.r-project.org/))
4445

@@ -57,6 +58,8 @@ The nf-core/chipseq pipeline comes with documentation about the pipeline, found
5758
## Credits
5859
These scripts were orginally written by Chuan Wang ([@chuan-wang](https://github.com/chuan-wang)) and Phil Ewels ([@ewels](https://github.com/ewels)) for use at the [National Genomics Infrastructure](https://portal.scilifelab.se/genomics/) at [SciLifeLab](http://www.scilifelab.se/) in Stockholm, Sweden. It has since been re-implemented by Harshil Patel ([@drpatelh](https://github.com/drpatelh)) from [The Bioinformatics & Biostatistics Group](https://www.crick.ac.uk/research/science-technology-platforms/bioinformatics-and-biostatistics/) at [The Francis Crick Institute](https://www.crick.ac.uk/), London.
5960

61+
Many thanks to others who have helped out along the way too, including (but not limited to): [@apeltzer](https://github.com/apeltzer), [@bc2zb](https://github.com/bc2zb), [@drejom](https://github.com/drejom), [@KevinMenden](https://github.com/KevinMenden), [@pditommaso](https://github.com/pditommaso).
62+
6063
## Citation
6164

6265
<!-- TODO nf-core: Add citation for pipeline after release. Uncomment lines below and add citation. -->

assets/multiqc/deseq2_clustering_header.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
#id: 'deseq2_clustering'
22
#section_name: 'DESeq2: Sample similarity'
3-
#description: "is generated from clustering by Euclidean distances between
3+
#description: " matrix is generated from clustering by Euclidean distances between
44
# <a href='https://bioconductor.org/packages/release/bioc/html/DESeq2.html' target='_blank'>DESeq2</a>
55
# rlog values for each sample
6-
# in the <a href='https://github.com/nf-core/atacseq/blob/master/bin/featurecounts_deseq2.r'><code>featurecounts_deseq2.r</code></a> script."
6+
# (see <a href='https://github.com/nf-core/chipseq/blob/master/bin/featurecounts_deseq2.r'><code>featurecounts_deseq2.r</code></a> script)."
77
#plot_type: 'heatmap'
88
#anchor: 'nfcore_chipseq-deseq2_clustering'
99
#pconfig:

assets/multiqc/deseq2_pca_header.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
#id: 'deseq2_pca'
22
#section_name: 'DESeq2: PCA plot'
3-
#description: "PCA plot between samples in the experiment.
3+
#description: "between samples in the experiment.
44
# These values are calculated using <a href='https://bioconductor.org/packages/release/bioc/html/DESeq2.html'>DESeq2</a>
5-
# in the <a href='https://github.com/nf-core/atacseq/blob/master/bin/featurecounts_deseq2.r'><code>featurecounts_deseq2.r</code></a> script."
5+
# in the <a href='https://github.com/nf-core/chipseq/blob/master/bin/featurecounts_deseq2.r'><code>featurecounts_deseq2.r</code></a> script."
66
#plot_type: 'scatter'
77
#anchor: 'nfcore_chipseq-deseq2_pca'
88
#pconfig:

assets/multiqc/multiqc_config.yaml

Lines changed: 45 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,6 @@ report_comment: >
33
analysis pipeline. For information about how to interpret these results, please see the
44
<a href="https://github.com/nf-core/chipseq/blob/master/docs/output.md" target="_blank">documentation</a>.
55
6-
skip_generalstats: true
7-
86
export_plots: true
97

108
fn_clean_exts:
@@ -49,6 +47,11 @@ module_order:
4947
info: 'This section of the report shows SAMTools results after merging libraries and before filtering.'
5048
path_filters:
5149
- '*mLb.mkD.sorted.bam*'
50+
- preseq:
51+
name: 'Preseq (merged library; unfiltered)'
52+
info: 'This section of the report shows Preseq results after merging libraries and before filtering.'
53+
path_filters:
54+
- '*mLb*'
5255
- samtools:
5356
name: 'SAMTools (merged library; filtered)'
5457
info: 'This section of the report shows SAMTools results after merging libraries and after filtering.'
@@ -83,14 +86,50 @@ report_section_order:
8386
order: -1400
8487
peak_annotation:
8588
order: -1500
86-
deseq2_pca:
89+
deseq2_pca_1:
8790
order: -1600
88-
deseq2_clustering:
91+
deseq2_pca_2:
8992
order: -1700
90-
software_versions:
93+
deseq2_pca_3:
9194
order: -1800
92-
nf-core-chipseq-summary:
95+
deseq2_pca_4:
9396
order: -1900
97+
deseq2_pca_5:
98+
order: -2000
99+
deseq2_pca_6:
100+
order: -2100
101+
deseq2_pca_7:
102+
order: -2200
103+
deseq2_pca_8:
104+
order: -2300
105+
deseq2_pca_9:
106+
order: -2400
107+
deseq2_pca_10:
108+
order: -2500
109+
deseq2_clustering_1:
110+
order: -2600
111+
deseq2_clustering_2:
112+
order: -2700
113+
deseq2_clustering_3:
114+
order: -2800
115+
deseq2_clustering_4:
116+
order: -2900
117+
deseq2_clustering_5:
118+
order: -3000
119+
deseq2_clustering_6:
120+
order: -3100
121+
deseq2_clustering_7:
122+
order: -3200
123+
deseq2_clustering_8:
124+
order: -3300
125+
deseq2_clustering_9:
126+
order: -3400
127+
deseq2_clustering_10:
128+
order: -3500
129+
software_versions:
130+
order: -3600
131+
nf-core-chipseq-summary:
132+
order: -3700
94133

95134
custom_plot_config:
96135
picard-insertsize:

bin/igv_files_to_session.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,9 @@
2424
argParser.add_argument('XML_OUT', help="XML output file.")
2525
argParser.add_argument('LIST_FILE', help="Tab-delimited file containing two columns i.e. file_name\tcolour. Header isnt required.")
2626
argParser.add_argument('GENOME', help="Full path to genome fasta file or shorthand for genome available in IGV e.g. hg19.")
27+
28+
## OPTIONAL PARAMETERS
29+
argParser.add_argument('-pp', '--path_prefix', type=str, dest="PATH_PREFIX", default='', help="Path prefix to be added at beginning of all files in input list file.")
2730
args = argParser.parse_args()
2831

2932
############################################
@@ -47,7 +50,7 @@ def makedir(path):
4750
############################################
4851
############################################
4952

50-
def igv_files_to_session(XMLOut,ListFile,Genome):
53+
def igv_files_to_session(XMLOut,ListFile,Genome,PathPrefix=''):
5154

5255
makedir(os.path.dirname(XMLOut))
5356

@@ -57,7 +60,9 @@ def igv_files_to_session(XMLOut,ListFile,Genome):
5760
line = fin.readline()
5861
if line:
5962
ifile,colour = line.strip().split('\t')
60-
fileList.append((ifile,colour))
63+
if len(colour.strip()) == 0:
64+
colour = '0,0,178'
65+
fileList.append((PathPrefix.strip()+ifile,colour))
6166
else:
6267
break
6368
fout.close()
@@ -74,7 +79,7 @@ def igv_files_to_session(XMLOut,ListFile,Genome):
7479
XMLStr += '\t<Panel height="1160" name="DataPanel" width="1897">\n'
7580
for ifile,colour in fileList:
7681
extension = os.path.splitext(ifile)[1].lower()
77-
if extension in ['.bed']:
82+
if extension in ['.bed','.broadpeak','.narrowpeak']:
7883
XMLStr += '\t\t<Track altColor="0,0,178" autoScale="false" clazz="org.broad.igv.track.FeatureTrack" color="%s" ' % (colour)
7984
XMLStr += 'displayMode="SQUISHED" featureVisibilityWindow="-1" fontSize="10" height="20" '
8085
XMLStr += 'id="%s" name="%s" renderer="BASIC_FEATURE" sortable="false" visible="true" windowFunction="count"/>\n' % (ifile,os.path.basename(ifile))
@@ -108,7 +113,7 @@ def igv_files_to_session(XMLOut,ListFile,Genome):
108113
############################################
109114
############################################
110115

111-
igv_files_to_session(XMLOut=args.XML_OUT,ListFile=args.LIST_FILE,Genome=args.GENOME)
116+
igv_files_to_session(XMLOut=args.XML_OUT,ListFile=args.LIST_FILE,Genome=args.GENOME,PathPrefix=args.PATH_PREFIX)
112117

113118
############################################
114119
############################################

bin/igv_get_files.sh

Lines changed: 0 additions & 23 deletions
This file was deleted.

0 commit comments

Comments
 (0)