forked from abyzovlab/CNVnator
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathREADME
More file actions
234 lines (156 loc) · 6.81 KB
/
README
File metadata and controls
234 lines (156 loc) · 6.81 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
README file for CNVnator software distribution
0. Sample run
=============
$ ./cnvnator -root file.root -tree file.bam -unique
$ ./cnvnator -root file.root -his 1000 -chrom 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y
OR
$ ./cnvnator -root file.root -his 1000 -chrom chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY
$ ./cnvnator -root file.root -stat 1000 -d dir_with_genome_fa/
$ ./cnvnator -root file.root -partition 1000
$ ./cnvnator -root file.root -call 1000
1. Compilation
==============
You must install ROOT package (http://root.cern.ch) and set up $ROOTSYS
variable (see ROOT documentation).
$ cd src/samtools
$ make
Even if compilation is not completed, but the file libbam.a has been created, you
can continue.
$ cd ../
$ make
If make doesn't work, try "make OMP=no" which will disable parallel support.
>>>Installing with Yeppp support
Yeppp (http://www.yeppp.info/) is a library which provides high-performance implementations of math functions.
To install with Yeppp support, download Yeppp from http://bitbucket.org/MDukhan/yeppp/downloads/yeppp-1.0.0.tar.bz2
and extract it at a location of your choice. Set YEPPPLIBDIR and YEPPPINCLUDEDIR directories appropriately. Typically,
for Linux-based systems on x86-64, YEPPPLIBDIR will be yeppp-1.0.0/binaries/linux/x86_64/ and YEPPPINCLUDEDIR will be
yeppp-1.0.0/library/headers. To build, type make YEPPPLIBDIR=... YEPPPINCLUDEDIR=... . To disable OpenMP also add OMP=no
to the make command.
2. Predicting CNV regions
=========================
Running involves a few steps outlined below. Chromosome names and lengths are
parsed from sam/bam file header. One can override this default behavior by
using the -genome option.
>>>EXTRACTING READ MAPPING FROM BAM/SAM FILES
$ ./cnvnator [-genome name] -root out.root [-chrom name1 ...] -tree [file1.bam ...]
out.root -- output ROOT file. See ROOT package documentation.
chr_name1 -- chromosome name.
file.bam -- bam files.
Chromosome names must be specified the same way as they are described in sam/bam
header, e.g., chrX or X. One can specify multiple chromosomes separated by
space. If no chromosome is specified, read mapping is extracted for all chromosomes
in sam/bam file. Note that this would require machines with a large physical
memory of 7Gb. Extracting read mapping for subsets of chromosomes is a way
around this issue. Also note that the root file is not being overwritten.
To have correct q0 field for CNV calls (see below), one needs to use the
option -unique when extracting read mapping from bam/sam files.
Example:
./cnvnator -root NA12878.root -chrom 1 2 3 -tree NA12878_ali.bam
for bam files with a header like this:
@HD VN:1.4 GO:none SO:coordinate
@SQ SN:1 LN:249250621
@SQ SN:2 LN:243199373
@SQ SN:3 LN:198022430
...
or
./cnvnator -root NA12878.root -chrom chr1 chr2 chr3 -tree NA12878_ali.bam
for bam files with a header like this:
@HD VN:1.4 GO:none SO:coordinate
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
...
Example:
./cnvnator -root NA12878.root -chrom 4 5 6 -tree NA12878_ali.bam
./cnvnator -root NA12878.root -chrom 7 8 9 -tree NA12878_ali.bam
is equivalent to
./cnvnator -root NA12878.root -chrom 4 5 6 7 8 9 -tree NA12878_ali.bam
>>>GENERATING A HISTOGRAM
$ ./cnvnator [-genome name] -root file.root [-chrom name1 ...] -his bin_size [-d dir]
This step is not memory consuming and so can be done for all chromosomes
at once. It can, of course, be carried for a subset of chromosomes also.
Files with chromosome sequences are required and should reside in the running
directory or in the directory specified by the -d option. Files should be named
as: chr1.fa, chr2.fa, etc.
>>>CALCULATING STATISTICS
$ ./cnvnator -root file.root [-chrom name1 ...] -stat bin_size
This step must be completed before proceeding to partitioning and CNV calling.
>>>RD SIGNAL PARTITIONING
$ ./cnvnator -root file.root [-chrom name1 ...] -partition bin_size [-ngc]
Option -ngc specifies not to use GC corrected RD signal. Partitioning
is the most time consuming step.
>>>CNV CALLING
$ ./cnvnator -root file.root [-chrom name1 ...] -call bin_size [-ngc]
Calls are printed to STDOUT.
The output is as follows:
CNV_type coordinates CNV_size normalized_RD e-val1 e-val2 e-val3 e-val4 q0
normalized_RD -- normalized to 1.
e-val1 -- is calculated using t-test statistics.
e-val2 -- is from the probability of RD values within the region to be in
the tails of a gaussian distribution describing frequencies of RD values in bins.
e-val3 -- same as e-val1 but for the middle of CNV
e-val4 -- same as e-val2 but for the middle of CNV
q0 -- fraction of reads mapped with q0 quality
To have correct output of q0 field one needs to use the option -unique when extracting read mapping from bam/sam files.
>>>MERGING ROOT FILES
./cnvnator [-genome name]-root out.root [-chrom name ...] -merge file1.root ...
Merging can be used when combining read mappings extracted from multiple files.
Note, histogram generation, statistics calculation, signal partitioning, and
CNV calling should be completed/redone after merging.
>>>VISUALIZING SPECIFIED REGIONS
./cnvnator -root file.root [-chrom chr_name1 ...] -view bin_size [-ngc]
Once prompted enter a genomic region, e.g.,
>12:11396601-11436500
or
>chr12:11396601-11436500
or
>12 11396601 11436500
or
>chr12 11396601 11436500
Additionally, one can specify the length of flanking regions (default is 10 kb) to
be also displayed, e.g.,
>12:11396601-11436500 100000
or
>chr12:11396601-11436500 100000
or
>12 11396601 11436500 100000
or
>chr12 11396601 11436500 100000
One can also perform instant genotyping by adding the word 'genotype', e.g.,
>12:11396601-11436500 genotype
or
>chr12:11396601-11436500 genotype
or
>12 11396601 11436500 genotype
or
>chr12 11396601 11436500 genotype
3. Genotyping genomic regions
=============================
For efficient genotype calculations, we recommend that you sort the list of regions by
chromosomes.
./cnvnator -root file.root -genotype bin_size [-ngc]
Once prompted enter a genomic region, e.g.,
>12:11396601-11436500
or
>chr12:11396601-11436500
or
>12 11396601 11436500
or
>chr12 11396601 11436500
One can also perform instant visualization by adding the word 'view', e.g.,
>12:11396601-11436500 view
or
>chr12:11396601-11436500 view
or
>12 11396601 11436500 view
or
>chr12 11396601 11436500 view
For genotyping of multiple regions one can use input piping, e.g.,
./cnvnator -root NA12878.root -genotype 100 << EOF
12:11396601-11436500
22:20999401-21300400
exit
EOF
Another example,
awk '{ print $2 } END { print "exit" }' calls.cnvnator | ./cnvnator -root NA12878.root -genotype 100
Please send your comments and suggestions to abyzov.alexej@mayo.edu.