-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathcircRNA.Rpres
More file actions
227 lines (182 loc) · 11.4 KB
/
circRNA.Rpres
File metadata and controls
227 lines (182 loc) · 11.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
<style>
.midcenter {
position: fixed;
top: 50%;
left: 50%;
}
.reveal {
font-size: 28px;
}
</style>
circRNA Detection
========================================================
author: lijiaping@genomics.cn
date: 2015-12-11
<a href="https://github.com/gahoo/circRNA_Rpres" style="position:absolute; bottom:0;" align="middle">
<img alt="barcode" src="barcode.png" align="middle"></img>
<br>
Souce Code
</a>
***
<img alt="cover" src="cover.jpg" align="bottom"></img>
What's circRNA
========================================================
Circular RNA (or circRNA) is a type of RNA which, unlike the better known linear RNA, forms a covalently closed continuous loop, i.e., in circular RNA the 3' and 5' ends normally present in an RNA molecule have been joined together. This feature confers numerous properties to circular RNAs, many of which have only recently been identified.

Biological Function
========================================================
<a href="https://www.pinterest.com/pin/280349145526550796/"><img src="function.jpg" width="75%"></img></a>
- miR sponge
- Potential
- Binding to RNA-binding proteins (RBPs) to form RNA-protein complexes.
- Protein production
- miR transportion
- mRNA regulation
Biological Function
========================================================
<img src="ElciRNAs-model.png" width="100%"></img>
EIciRNAs might hold factors such as U1 snRNP through specific RNA-RNA interaction between U1 snRNA and EIciRNA, and then the EIciRNA-U1 snRNP complexes might further interact with the Pol II transcription complex at the promoters of parental genes to enhance gene expression.
<small>
Li Z, Huang C, Bao C, et al. Exon-intron circular RNAs regulate transcription in the nucleus[J]. Nature structural & molecular biology, 2015, 22(3): 256-264.
</small>
circRNA Detection
========================================================
- find_circ
- [Circular RNAs are a large class of animal RNAs with regulatory potency](http://dx.doi.org/10.1038/nature11928)
- CIRCexplorer
- [Complementary Sequence-Mediated Exon Circularization](http://dx.doi.org/10.1016/j.cell.2014.09.001)
- CIRI
- [CIRI: an efficient and unbiased algorithm for de novo circular RNA identification](http://dx.doi.org/10.1186%2Fs13059-014-0571-3)
========================================================
<div class="midcenter" style="font-size: 4em;">find_circ</div>
find_circ
========================================================
<img src="find_circ.flow.png" align="middle"></img>
***
1. GU/AG flanking the splice sites;
2. unambiguous breakpoint;
3. a maximum of two mismatches in the extension procedure;
4. breakpoint reside no more than 2 nucleotides inside an anchor;
5. at least two independent reads support the junction;
6. unique anchor alignments with a safety margin to the next-best alignment of at least one anchor above 35 points;
7. two splice sites no more than 100 kb
========================================================
<div class="midcenter" style="font-size: 4em;">CIRCexplorer</div>
CIRCexplorer
========================================================
<img src="CIRCexplorer.flow.jpg" align="middle"></img>
****
1. candidate back-spliced junction reads: unmapped with TopHat but mapped with TopHat-Fusion on the same chromosome in a noncolinear ordering.
2. realigned against gene annotation to determine the precise positions of downstream donor and upstream acceptor splice sites, respectively.
3. exonic circular RNAs were annotated with the support of identified junction
reads
Orientation-Opposite Alu Elements
========================================================
- On average, there were about three Alu elements in both upstream and downstream flanking introns of circularized exons.
- Alu elements that could form IRAlus, either convergent (G) or divergent (H)
- IRAlus pairing might bring a splice donor site at a downstream exon and a splice acceptor site at an upstream exon close to each other and promote back splicing.

Alternative Circularization
========================================================

Circular RNAs with their back-spliced junction reads (numbers) from H9 poly(A)-/RNase R RNA-seq were indicated by arc lines. Different shades of black colors indicated different number of junction reads.

Competition of RNA Pairing Affects Exon Circularization
========================================================
The competition models of RNA pairing by complementary sequence-mediated exon circularization.
- (Left) The RNA pairing by IRAlus within one individual intron promotes **normal constitutive splicing** , resulting in a linearized RNA transcript with exon inclusion and no circularization.
- (Right) RNA pairing by IRAlus across flanking introns promotes **back splicing**, leading to a linearized RNA transcript with exon skipping and circularization.

circRNA biogenesis
========================================================
[*Analysis of Intron Sequences Reveals Hallmarks of Circular RNA Biogenesis in Animals*](http://dx.doi.org/10.1016/j.celrep.2014.12.019)
- Reverse complementary matches are a conserved feature of circRNA biogenesis
- Computational analysis of intron sequences allows successful prediction of circRNAs
- The circRNA formation mechanism depends on the RNA-editing enzyme ADAR
***

========================================================
<div class="midcenter" style="font-size: 4em;">CIRI</div>
Challenges
========================================================
1. a large proportion of circRNAs have relatively **low abundance** compared with their linear counterparts, while most RNA-seq data were generated **without a circRNA enrichment step**, such as *RNase R treatment*, which makes it difficult to accurately distinguish circRNAs from false positives caused by noise in RNA-seq data;
2. existing **annotations** of reference genomes were mainly **based on linear RNA transcript analyses**, which is not applicable for circRNA identification, and non-model organisms often have incomplete gene annotation or even **lack gene annotation**;
3. **read lengths vary** in different sequencing data sets, which challenges unbiased identification of circRNAs;
4. complexities of eukaryotic transcription may generate other non-canonical transcripts, such as **lariats and fusion genes**, in which corresponding reads similar to circular junctions may **lead to false discoveries**.
Related works
========================================================
- Salzman [*Cell-type specific features of circular RNA expression*](http://www.genomebiology.com/pubmed/23446346)
- alignment to customized **annotated exon boundaries**
- filtration not effective on **low coverage regions**
- Memczak [*Circular RNAs are a large class of animal RNAs with regulatory potency*](http://www.genomebiology.com/pubmed/23446348) (find_circ)
- utilized **GT-AG splicing signals** flanking exons
- adopt a **two-segment alignment** of split reads
- **inability to detect short exon-flanking circRNAs**
- filtration insufficient for removal of **false positives**
- Jeck [*Circular RNAs are abundant, conserved, and associated with ALU repeats*](http://www.genomebiology.com/pubmed/22319583)
- **compares untreated and RNase-treated** results to **confirm** circRNA candidates and **remove false positives**.
- sensitive and able to estimate circRNAs **relative abundance**
- **introduce systematic bias** in the enrichment procedure
- not applicable to RNA-seq data without circRNA enrichment
What's CIGAR
========================================================
From [wiki](http://genome.sph.umich.edu/wiki/SAM#What_is_a_CIGAR.3F)
The CIGAR string is a sequence of of base lengths and the associated operation.
For example:
```
RefPos: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Reference: T A C T G A A C T G A C T A A
Read: A C T A G A A T G G C T
```
With the alignment above, you get:
```
POS: 2
CIGAR: 3M1I3M1D5M
```
What's CIGAR | clipped alignment
========================================================
From [datatang.org](http://davetang.org/wiki/tiki-index.php?page=SAM#Clipped_alignment)
a sequence may *not be aligned from the first residue to the last one*. Subsequences at the ends may be clipped off. We introduce operation **'S'** to describe **clipped alignment**. One query sequence may be aligned to **multiple places** on the reference genome, either with or without overlaps. In SAM, we keep multiple hits as multiple alignment records. To avoid presenting the full query sequence multiple times for non-overlapping hits, we introduce operation **'H'** to describe **hard clipped alignment**.
clipped alignment:
```
REF: AGCTAGCATCGTGTCGCCCGTCTAGCATACGCATGATCGACTGTC
READ: gggGTGTAACC-GACTAGgggg
CIGAR: 3S8M1D6M4S
CIGAR: 3H8M1D6M4H (multi-part alignment)
sequence stored: GGGGTGTAACCGACTAGGGGG
sequence stored: GTGTAACCGACTAG (multi-part alignment)
```
PCC(paired chiastic clipping) signal
========================================================
<img src="CIRI.types.gif" width="100%"></img>
- A. typical circRNA: xS/HyM or xMyS/H
- B. short-exon flanking circRNA: (x+y)S/HzM or xM(y+z)S/H
- C. small circRNA: xS/HyMzS/H
- D. paired-end mapping (PEM)
PEM(paired-end mapping) signal
========================================================
<img src="CIRI.types.gif" width="100%"></img>
- D. paired-end mapping (PEM)
- **PEM signals** - paired read of a junction read align within inferred circRNA area;
- **GT-AG or AT-AC splicing signals** in junctions;
- mapping **quantity** and **quality**, and mapping **read length** in junctions.
CIRI | pipeline
========================================================
The CIRI pipeline for detecting circRNAs from transcriptome data. (DP: dynamic programming).
<img src="CIRI.flow.gif" width="80%"></img>
CIRI | second scan
========================================================
CIRI scans SAM alignment for the second time and makes a thorough investigation of **each read**.
Information such as **CIGAR value**, **mapping position** and **mapping quality** of a read and **its paired read** is taken into account.
Any **read mapping to the related region** of a putative circRNA junction in the SAM alignment is aligned with related junction reads identified in the first scan using a **dynamic programming algorithm** to decide whether the former supports the junction of a **circRNA** or corresponding **linear transcript**.
CIRI | multiple mapping filtration
========================================================
This step also facilitates a further filter to **prevent false predictions** resulting from similarity of **homologous genes** or **repetitive sequences**
- record similar or identical reads with **distinct mapping position** to junction reads
- summarizes **mapping positions of all related reads** of a candidate junction
- compares **counts** and **mapping** lengths of the reads
- whether the reads reliably reflect a circRNA junction?
- whether the candidate circRNA will be kept?
- minimum of relative expression(counting junction and **non-**junction reads)
========================================================
<div class="midcenter" style="font-size: 4em;">End</div>