Skip to content

Commit a97c018

Browse files
authored
Merge pull request #90 from nf-core/dev
Dev > Master for release
2 parents a76bd69 + 11c2b17 commit a97c018

19 files changed

+455
-266
lines changed

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ before_install:
1414
- docker pull nfcore/chipseq:dev
1515
# Fake the tag locally so that the pipeline runs properly
1616
# Looks weird when this is :dev to :dev, but makes sense when testing code for a release (:dev to :1.0.1)
17-
- docker tag nfcore/chipseq:dev nfcore/chipseq:dev
17+
- docker tag nfcore/chipseq:dev nfcore/chipseq:1.0.0
1818

1919
install:
2020
# Install Nextflow

CHANGELOG.md

Lines changed: 21 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,24 @@
11
# nf-core/chipseq: Changelog
22

3-
## v1.0dev - [date]
4-
* Add option for building BWA index for larger ref
5-
* Update software versions in uppmax-modules
6-
* Add function to validate input files in MACS config file
7-
* Remove ngsplot and move the function of plotProfile in deepTools
8-
* Support reference genome GRCm37
9-
* Add pre-defined genome sizes for all reference genomes to support macs2 peak calling and downstream processing
10-
* Add blacklist files for ce11, BDGP6, hg38, and mm9
11-
* Documents revised accordingly.
12-
* Major overhaul of docs and assets in-line with nf-core/tools v1.4
13-
* Added ability to use nf-core/configs along with associated docs
14-
* Updated manifest scope to deal with pipeline version
15-
* Removed NGI and SciLifeLab logos, and changed name of pipeline logo
16-
* Added awsbatch configuration
17-
* Put file() calls in fromFilePath()
18-
* Removed --project param specific to UPPMAX
19-
* Moved appropriate default params variables to nextflow.config
20-
* Changed Picard memory specification
21-
* Changed version number back to 1.0dev from 1.0
22-
* Updated conda packages
23-
* Major template changes in-line with nf-core/tools v1.5
3+
All notable changes to this project will be documented in this file.
244

25-
Repository moved from <https://github.com/SciLifeLab/NGI-ChIPseq>
5+
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
6+
and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html).
7+
8+
## [1.0.0] - 2019-06-06
9+
10+
Initial release of nf-core/chipseq pipeline.
11+
12+
### `Added`
13+
14+
* Raw read QC (FastQC)
15+
* Adapter trimming (Trim Galore!)
16+
* Map and filter reads (BWA, picard, SAMtools, BEDTools, BAMTools, Pysam)
17+
* Create library-size normalised bigWig tracks (BEDTools, bedGraphToBigWig)
18+
* Alignment QC metrics (Preseq, picard)
19+
* ChIP-seq QC metrics (deepTools, phantompeakqualtools)
20+
* Call and annotate broad/narrow peaks (MACS2, HOMER)
21+
* Create consensus set of peaks per antibody (BEDTools)
22+
* Quantification and differential binding analysis (featureCounts, DESeq2)
23+
* Collate appropriate files for genome browser visualisation (IGV)
24+
* Collate and present various QC metrics (MultiQC, R)

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@ LABEL authors="Philip Ewels" \
44

55
COPY environment.yml /
66
RUN conda env create -f /environment.yml && conda clean -a
7-
ENV PATH /opt/conda/envs/nf-core-chipseq-1.0dev/bin:$PATH
7+
ENV PATH /opt/conda/envs/nf-core-chipseq-1.0.0/bin:$PATH

README.md

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -30,15 +30,16 @@ The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool
3030
* reads that map to different chromosomes ([`Pysam`](http://pysam.readthedocs.io/en/latest/installation.html); *paired-end only*)
3131
* reads that arent in FR orientation ([`Pysam`](http://pysam.readthedocs.io/en/latest/installation.html); *paired-end only*)
3232
* reads where only one read of the pair fails the above criteria ([`Pysam`](http://pysam.readthedocs.io/en/latest/installation.html); *paired-end only*)
33-
3. Create normalised bigWig files scaled to 1 million mapped reads ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`wigToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/))
34-
4. Generate gene-body meta-profile from bigWig files ([`deepTools`](https://deeptools.readthedocs.io/en/develop/content/tools/plotProfile.html))
35-
5. Calculate genome-wide IP enrichment relative to control ([`deepTools`](https://deeptools.readthedocs.io/en/develop/content/tools/plotFingerprint.html))
36-
6. Calculate strand cross-correlation peak and ChIP-seq quality measures including NSC and RSC ([`phantompeakqualtools`](https://github.com/kundajelab/phantompeakqualtools))
37-
7. Call broad/narrow peaks ([`MACS2`](https://github.com/taoliu/MACS))
38-
8. Annotate peaks relative to gene features ([`HOMER`](http://homer.ucsd.edu/homer/download.html))
39-
9. Create consensus peakset across all samples and create tabular file to aid in the filtering of the data ([`BEDTools`](https://github.com/arq5x/bedtools2/))
40-
10. Count reads in consensus peaks ([`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/))
41-
11. Differential binding analysis, PCA and clustering ([`R`](https://www.r-project.org/), [`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html))
33+
3. Alignment-level QC and estimation of library complexity ([`picard`](https://broadinstitute.github.io/picard/), [`Preseq`](http://smithlabresearch.org/software/preseq/))
34+
4. Create normalised bigWig files scaled to 1 million mapped reads ([`BEDTools`](https://github.com/arq5x/bedtools2/), [`bedGraphToBigWig`](http://hgdownload.soe.ucsc.edu/admin/exe/))
35+
5. Generate gene-body meta-profile from bigWig files ([`deepTools`](https://deeptools.readthedocs.io/en/develop/content/tools/plotProfile.html))
36+
6. Calculate genome-wide IP enrichment relative to control ([`deepTools`](https://deeptools.readthedocs.io/en/develop/content/tools/plotFingerprint.html))
37+
7. Calculate strand cross-correlation peak and ChIP-seq quality measures including NSC and RSC ([`phantompeakqualtools`](https://github.com/kundajelab/phantompeakqualtools))
38+
8. Call broad/narrow peaks ([`MACS2`](https://github.com/taoliu/MACS))
39+
9. Annotate peaks relative to gene features ([`HOMER`](http://homer.ucsd.edu/homer/download.html))
40+
10. Create consensus peakset across all samples and create tabular file to aid in the filtering of the data ([`BEDTools`](https://github.com/arq5x/bedtools2/))
41+
11. Count reads in consensus peaks ([`featureCounts`](http://bioinf.wehi.edu.au/featureCounts/))
42+
12. Differential binding analysis, PCA and clustering ([`R`](https://www.r-project.org/), [`DESeq2`](https://bioconductor.org/packages/release/bioc/html/DESeq2.html))
4243
6. Create IGV session file containing bigWig tracks, peaks and differential sites for data visualisation ([`IGV`](https://software.broadinstitute.org/software/igv/)).
4344
7. Present QC for raw read, alignment, peak-calling and differential binding results ([`MultiQC`](http://multiqc.info/), [`R`](https://www.r-project.org/))
4445

@@ -57,10 +58,11 @@ The nf-core/chipseq pipeline comes with documentation about the pipeline, found
5758
## Credits
5859
These scripts were orginally written by Chuan Wang ([@chuan-wang](https://github.com/chuan-wang)) and Phil Ewels ([@ewels](https://github.com/ewels)) for use at the [National Genomics Infrastructure](https://portal.scilifelab.se/genomics/) at [SciLifeLab](http://www.scilifelab.se/) in Stockholm, Sweden. It has since been re-implemented by Harshil Patel ([@drpatelh](https://github.com/drpatelh)) from [The Bioinformatics & Biostatistics Group](https://www.crick.ac.uk/research/science-technology-platforms/bioinformatics-and-biostatistics/) at [The Francis Crick Institute](https://www.crick.ac.uk/), London.
5960

61+
Many thanks to others who have helped out along the way too, including (but not limited to): [@apeltzer](https://github.com/apeltzer), [@bc2zb](https://github.com/bc2zb), [@drejom](https://github.com/drejom), [@KevinMenden](https://github.com/KevinMenden), [@pditommaso](https://github.com/pditommaso).
62+
6063
## Citation
6164

62-
<!-- TODO nf-core: Add citation for pipeline after release. Uncomment lines below and add citation. -->
63-
<!-- If you use nf-core/chipseq for your analysis, please cite it as follows: -->
65+
<!-- If you use nf-core/chipseq for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXXX](https://doi.org/10.5281/zenodo.XXXXXXX)
6466
6567
You can cite the `nf-core` pre-print as follows:
6668
Ewels PA, Peltzer A, Fillinger S, Alneberg JA, Patel H, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. **nf-core: Community curated bioinformatics pipelines**. *bioRxiv*. 2019. p. 610741. [doi: 10.1101/610741](https://www.biorxiv.org/content/10.1101/610741v1).

assets/multiqc/deseq2_clustering_header.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
#id: 'deseq2_clustering'
22
#section_name: 'DESeq2: Sample similarity'
3-
#description: "is generated from clustering by Euclidean distances between
3+
#description: " matrix is generated from clustering by Euclidean distances between
44
# <a href='https://bioconductor.org/packages/release/bioc/html/DESeq2.html' target='_blank'>DESeq2</a>
55
# rlog values for each sample
6-
# in the <a href='https://github.com/nf-core/atacseq/blob/master/bin/featurecounts_deseq2.r'><code>featurecounts_deseq2.r</code></a> script."
6+
# (see <a href='https://github.com/nf-core/chipseq/blob/master/bin/featurecounts_deseq2.r'><code>featurecounts_deseq2.r</code></a> script)."
77
#plot_type: 'heatmap'
88
#anchor: 'nfcore_chipseq-deseq2_clustering'
99
#pconfig:

assets/multiqc/deseq2_pca_header.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
#id: 'deseq2_pca'
22
#section_name: 'DESeq2: PCA plot'
3-
#description: "PCA plot between samples in the experiment.
3+
#description: "between samples in the experiment.
44
# These values are calculated using <a href='https://bioconductor.org/packages/release/bioc/html/DESeq2.html'>DESeq2</a>
5-
# in the <a href='https://github.com/nf-core/atacseq/blob/master/bin/featurecounts_deseq2.r'><code>featurecounts_deseq2.r</code></a> script."
5+
# in the <a href='https://github.com/nf-core/chipseq/blob/master/bin/featurecounts_deseq2.r'><code>featurecounts_deseq2.r</code></a> script."
66
#plot_type: 'scatter'
77
#anchor: 'nfcore_chipseq-deseq2_pca'
88
#pconfig:

assets/multiqc/multiqc_config.yaml

Lines changed: 46 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,6 @@ report_comment: >
33
analysis pipeline. For information about how to interpret these results, please see the
44
<a href="https://github.com/nf-core/chipseq/blob/master/docs/output.md" target="_blank">documentation</a>.
55
6-
skip_generalstats: true
7-
86
export_plots: true
97

108
fn_clean_exts:
@@ -18,6 +16,7 @@ fn_clean_exts:
1816
- 'mLb'
1917
- '_peaks'
2018
- '_spp'
19+
- '.spp'
2120

2221
module_order:
2322
- fastqc:
@@ -49,6 +48,11 @@ module_order:
4948
info: 'This section of the report shows SAMTools results after merging libraries and before filtering.'
5049
path_filters:
5150
- '*mLb.mkD.sorted.bam*'
51+
- preseq:
52+
name: 'Preseq (merged library; unfiltered)'
53+
info: 'This section of the report shows Preseq results after merging libraries and before filtering.'
54+
path_filters:
55+
- '*mLb*'
5256
- samtools:
5357
name: 'SAMTools (merged library; filtered)'
5458
info: 'This section of the report shows SAMTools results after merging libraries and after filtering.'
@@ -83,14 +87,50 @@ report_section_order:
8387
order: -1400
8488
peak_annotation:
8589
order: -1500
86-
deseq2_pca:
90+
deseq2_pca_1:
8791
order: -1600
88-
deseq2_clustering:
92+
deseq2_pca_2:
8993
order: -1700
90-
software_versions:
94+
deseq2_pca_3:
9195
order: -1800
92-
nf-core-chipseq-summary:
96+
deseq2_pca_4:
9397
order: -1900
98+
deseq2_pca_5:
99+
order: -2000
100+
deseq2_pca_6:
101+
order: -2100
102+
deseq2_pca_7:
103+
order: -2200
104+
deseq2_pca_8:
105+
order: -2300
106+
deseq2_pca_9:
107+
order: -2400
108+
deseq2_pca_10:
109+
order: -2500
110+
deseq2_clustering_1:
111+
order: -2600
112+
deseq2_clustering_2:
113+
order: -2700
114+
deseq2_clustering_3:
115+
order: -2800
116+
deseq2_clustering_4:
117+
order: -2900
118+
deseq2_clustering_5:
119+
order: -3000
120+
deseq2_clustering_6:
121+
order: -3100
122+
deseq2_clustering_7:
123+
order: -3200
124+
deseq2_clustering_8:
125+
order: -3300
126+
deseq2_clustering_9:
127+
order: -3400
128+
deseq2_clustering_10:
129+
order: -3500
130+
software_versions:
131+
order: -3600
132+
nf-core-chipseq-summary:
133+
order: -3700
94134

95135
custom_plot_config:
96136
picard-insertsize:

bin/igv_files_to_session.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,9 @@
2424
argParser.add_argument('XML_OUT', help="XML output file.")
2525
argParser.add_argument('LIST_FILE', help="Tab-delimited file containing two columns i.e. file_name\tcolour. Header isnt required.")
2626
argParser.add_argument('GENOME', help="Full path to genome fasta file or shorthand for genome available in IGV e.g. hg19.")
27+
28+
## OPTIONAL PARAMETERS
29+
argParser.add_argument('-pp', '--path_prefix', type=str, dest="PATH_PREFIX", default='', help="Path prefix to be added at beginning of all files in input list file.")
2730
args = argParser.parse_args()
2831

2932
############################################
@@ -47,7 +50,7 @@ def makedir(path):
4750
############################################
4851
############################################
4952

50-
def igv_files_to_session(XMLOut,ListFile,Genome):
53+
def igv_files_to_session(XMLOut,ListFile,Genome,PathPrefix=''):
5154

5255
makedir(os.path.dirname(XMLOut))
5356

@@ -57,7 +60,9 @@ def igv_files_to_session(XMLOut,ListFile,Genome):
5760
line = fin.readline()
5861
if line:
5962
ifile,colour = line.strip().split('\t')
60-
fileList.append((ifile,colour))
63+
if len(colour.strip()) == 0:
64+
colour = '0,0,178'
65+
fileList.append((PathPrefix.strip()+ifile,colour))
6166
else:
6267
break
6368
fout.close()
@@ -74,7 +79,7 @@ def igv_files_to_session(XMLOut,ListFile,Genome):
7479
XMLStr += '\t<Panel height="1160" name="DataPanel" width="1897">\n'
7580
for ifile,colour in fileList:
7681
extension = os.path.splitext(ifile)[1].lower()
77-
if extension in ['.bed']:
82+
if extension in ['.bed','.broadpeak','.narrowpeak']:
7883
XMLStr += '\t\t<Track altColor="0,0,178" autoScale="false" clazz="org.broad.igv.track.FeatureTrack" color="%s" ' % (colour)
7984
XMLStr += 'displayMode="SQUISHED" featureVisibilityWindow="-1" fontSize="10" height="20" '
8085
XMLStr += 'id="%s" name="%s" renderer="BASIC_FEATURE" sortable="false" visible="true" windowFunction="count"/>\n' % (ifile,os.path.basename(ifile))
@@ -108,7 +113,7 @@ def igv_files_to_session(XMLOut,ListFile,Genome):
108113
############################################
109114
############################################
110115

111-
igv_files_to_session(XMLOut=args.XML_OUT,ListFile=args.LIST_FILE,Genome=args.GENOME)
116+
igv_files_to_session(XMLOut=args.XML_OUT,ListFile=args.LIST_FILE,Genome=args.GENOME,PathPrefix=args.PATH_PREFIX)
112117

113118
############################################
114119
############################################

bin/igv_get_files.sh

Lines changed: 0 additions & 23 deletions
This file was deleted.

0 commit comments

Comments
 (0)