Skip to content

Commit 7e77ed2

Browse files
committed
fix class 27
1 parent 8dbe22a commit 7e77ed2

File tree

2 files changed

+43
-85
lines changed

2 files changed

+43
-85
lines changed

exercises/ex-27.qmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -524,7 +524,7 @@ hur_utr3_6mer <- ??(
524524
width = 6, # k
525525
as.prob = F,
526526
simplify.as="matrix") %>%
527-
colSums(.) %>%
527+
colSums() %>%
528528
as.data.frame()
529529
530530
colnames(hur_utr3_6mer) <- "hur_utr_count"
@@ -555,7 +555,7 @@ utr3_6mer <- ??(
555555
width = 6,
556556
as.prob = F,
557557
simplify.as="matrix") %>%
558-
colSums(.) %>%
558+
colSums() %>%
559559
as.data.frame()
560560
561561
colnames(utr3_6mer) <- "utr_count"

slides/slides-27.qmd

Lines changed: 41 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ For example, an RBP might bind to specific sequences in the 3' UTR of an mRNA an
5757

5858
## Mapping of RBP binding sites: CLIP-seq {.smaller}
5959

60-
::: columns
60+
::::: columns
6161
::: {.column width="50%"}
6262
- Covalent cross-linking of RBPs to RNA using 254 nm UV light.
6363
- Lyse cells
@@ -73,11 +73,11 @@ For example, an RBP might bind to specific sequences in the 3' UTR of an mRNA an
7373
::: {.column width="50%"}
7474
![](/img/block-rna/clipseq.jpg)
7575
:::
76-
:::
76+
:::::
7777

7878
## Mapping of RBP binding sites: PAR-CLIP {.smaller}
7979

80-
::: columns
80+
::::: columns
8181
::: {.column width="50%"}
8282
**PAR-CLIP**: photoactivatable ribonucleoside enhanced clip
8383

@@ -90,14 +90,12 @@ For example, an RBP might bind to specific sequences in the 3' UTR of an mRNA an
9090
::: {.column width="50%"}
9191
![](/img/block-rna/nihms832576f1.jpg){width="389"}
9292
:::
93-
:::
94-
93+
:::::
9594

9695
## Mapping of RBP binding sites: analysis {.smaller}
9796

98-
::: columns
99-
::: {.column width="50%" .nonincremental}
100-
97+
::::: columns
98+
::: {.column .nonincremental width="50%"}
10199
Most CLIP-seq approaches have single-nucleotide resolution information. However, they vary in the frequency of that information and the efficiency of the procedure.
102100

103101
The basic concept to **call a peak/binding sites** from CLIP-seq:
@@ -107,62 +105,51 @@ The basic concept to **call a peak/binding sites** from CLIP-seq:
107105
- Use nucleotide level information to de/refine position of RBP-binding sites
108106

109107
In this class we will be working with PAR-CLIP data. Regardless, I will show you how to access ENCODE eCLIP data. You would easily be able to apply what you learn on those data.
110-
111108
:::
112109

113110
::: {.column width="50%"}
114111
![](/img/block-rna/reads_cliptype.jpg)
115112
:::
116-
:::
117-
113+
:::::
118114

119115
## Analysis overview {.smaller}
120116

121-
::: columns
117+
::::: columns
122118
::: {.column width="50%"}
123-
124119
![](/img/block-rna/workflow.jpg)
125-
126120
:::
127-
::: {.column width="50%" .nonincremental}
128121

129-
1. Filter out low quality or short reads (<18 for larger genomes)
130-
2. Trim adapters
131-
3. Collapse PCR duplicate reads (molecular indexes)
132-
4. Align to genome/transcriptome
133-
5. Call peaks
134-
6. Downstream analysis
135-
136-
:::
122+
::: {.column .nonincremental width="50%"}
123+
1. Filter out low quality or short reads (\<18 for larger genomes)
124+
2. Trim adapters
125+
3. Collapse PCR duplicate reads (molecular indexes)
126+
4. Align to genome/transcriptome
127+
5. Call peaks
128+
6. Downstream analysis
137129
:::
130+
:::::
138131

139132
## Pre-processing {.smaller}
140133

141134
[Cutadapt](https://cutadapt.readthedocs.io/en/stable/)
142135

143-
144-
145136
![](/img/block-rna/adapters.jpg)
146137

147-
148138
## Calling binding sites: PARalyzer {.smaller}
149139

150-
::: columns
140+
::::: columns
151141
::: {.column width="50%"}
142+
The pattern of T = \> C conversions, coupled with read density, can thus provide a strong signal to generate a high-resolution map of confident RNA-protein interaction sites.
152143

153-
The pattern of T = > C conversions, coupled with read density, can thus provide a strong signal to generate a high-resolution map of confident RNA-protein interaction sites.
154-
155-
A non-parametric kernel-density estimate used to identify the RNA-protein interaction sites from a combination of T = > C conversions and read density.
144+
A non-parametric kernel-density estimate used to identify the RNA-protein interaction sites from a combination of T = \> C conversions and read density.
156145

157146
See [PARalyzer](https://pubmed.ncbi.nlm.nih.gov/21851591/) for more information.
158-
159147
:::
160-
::: {.column width="50%"}
161148

149+
::: {.column width="50%"}
162150
![](/img/block-rna/FMRreads.png)
163151
:::
164-
:::
165-
152+
:::::
166153

167154
## Today's menu {.smaller}
168155

@@ -174,18 +161,11 @@ We will be starting with position of the binding sites in the genome (the output
174161

175162
#### 2. Perform motif analysis accounting for the background sequence regions.
176163

177-
178164
![](/img/block-rna/workflow_2hallf.jpg)
179165

180166
## Annotation of binding sites {.smaller}
181167

182-
183-
Where are the binding sites?
184-
- Which genes?
185-
- What region of those genes?
186-
- How many binding sites per region?
187-
- How many binding sites per gene?
188-
- How many binding sites per gene by region?
168+
Where are the binding sites? - Which genes? - What region of those genes? - How many binding sites per region? - How many binding sites per gene? - How many binding sites per gene by region?
189169

190170
We will use `annotatr` and `Granges` to answer these questions.
191171

@@ -215,8 +195,6 @@ annotations <- build_annotations(genome = "hg19", annotations = my_hg19_annots)
215195
annotations
216196
```
217197

218-
219-
220198
## Extract annotation categories {.smaller}
221199

222200
What information is contained within the `annotations` object?
@@ -226,7 +204,7 @@ What information is contained within the `annotations` object?
226204
#| echo: true
227205
#| label: annot-explore
228206
229-
my_hg19_annots[3]
207+
230208
231209
# get introns
232210
annotation_introns <- annotations[annotations$type == my_hg19_annots[3]]
@@ -247,13 +225,8 @@ annotation_cds <- GenomicRanges::reduce(annotation_cds)
247225
length_cds <- width(annotation_cds)
248226
```
249227

250-
251-
252-
253-
254228
## Compare introns and cds length {.smaller}
255229

256-
257230
```{r}
258231
#| eval: true
259232
#| echo: true
@@ -293,7 +266,7 @@ ggplot(all_length, aes(x = nt, color = cat)) +
293266
theme_cowplot()
294267
```
295268

296-
The typical* human intron is way longer than a CDS exon.
269+
The typical\* human intron is way longer than a CDS exon.
297270

298271
## ELAVL1/HuR {.smaller .nonincremental}
299272

@@ -303,25 +276,20 @@ HuR binds to AU-rich elements (ARE) in 3’ UTRs of mRNAs to promote mRNA stabil
303276

304277
![](/img/block-rna/hur_mechanism.png)
305278

306-
307279
This model makes a few specific prediction:
308280

309-
1. HuR binds to the 3' UTR.
310-
2. HuR binds to AU-rich sequences (AUUUA).
311-
3. HuR binding promotes target RNA stabilization (and binding by the other RBPs to the ARE promotes destabilization).
281+
1. HuR binds to the 3' UTR.
282+
2. HuR binds to AU-rich sequences (AUUUA).
283+
3. HuR binding promotes target RNA stabilization (and binding by the other RBPs to the ARE promotes destabilization).
312284

313285
We will explore these predictions during the next couple classes.
314286

315287
## PAR-CLIP data {.smaller}
316288

317-
Reminder that we will be using this resource:
318-
[rag-tag ENCODE](https://github.com/BIMSBbioinfo/RCAS_meta-analysis)
289+
Reminder that we will be using this resource: [rag-tag ENCODE](https://github.com/BIMSBbioinfo/RCAS_meta-analysis)
319290

320291
We are looking for an ELAVL1 PAR-CLIP corresponding to this SRA (short-read archive) ID: **SRR248532**
321292

322-
323-
324-
325293
```{r}
326294
#| eval: true
327295
#| echo: true
@@ -344,7 +312,6 @@ hur_regions <- hur_regions[hur_regions$score > 1]
344312
hur_regions
345313
```
346314

347-
348315
## Annotate PAR-CLIP data {.smaller}
349316

350317
```{r}
@@ -387,7 +354,6 @@ hur_annot$annot.type <- gsub("hg19_genes_", "", hur_annot$annot.type)
387354
table(hur_annot$annot.type)
388355
```
389356

390-
391357
## HuR binding region preference {.smaller}
392358

393359
It looks like HuR prefers binding to 3' UTRs and introns. That is a bit of a surprise given the model above indicating 3' UTR binding. Well let's take a step back and frame our expectation using what we know about the genome.
@@ -398,7 +364,6 @@ In this case, how many basepairs are introns and 3' UTRs in the genome?
398364

399365
## binding region length biases {.smaller}
400366

401-
402367
```{r}
403368
#| eval: true
404369
#| echo: true
@@ -427,14 +392,12 @@ for (i in 1:length(my_hg19_annots)) {
427392
barplot(mylengths[1:4], las = 2, main = "total bases per category", log = "y")
428393
```
429394

430-
431395
## Control for CLIP-binding sites {.smaller}
432396

433397
We need a way to figure out a null model OR background expectation.
434398

435399
What if we were to take our HuR binding and randomize their position and then repeat the annotation on the randomized binding sites?
436400

437-
438401
```{r}
439402
#| eval: true
440403
#| echo: true
@@ -511,11 +474,9 @@ ggplot(
511474
) +
512475
geom_bar(stat = "identity") +
513476
ylab("Observed vs Expected") +
514-
theme_cowplot()
477+
theme_cowplot() + geom_hline(yintercept = 1)
515478
```
516479

517-
518-
519480
## 5 MINUTE BREAK
520481

521482
## What sequence does HuR bind to? {.smaller}
@@ -524,9 +485,9 @@ Is it just *AUUUA*?
524485

525486
**Different transcript regions have different nucleotide composition.**
526487

527-
- 5' UTRs are more GC-rich
488+
- 5' UTRs are more GC-rich
528489

529-
- 3' UTRs are more AU-rich
490+
- 3' UTRs are more AU-rich
530491

531492
![](/img/block-rna/gc.jpg)
532493

@@ -538,13 +499,12 @@ Steps to determine k-mer composition (we use 6mers) for any set of intervals
538499

539500
We'll do it for both HuR binding sites and then compare it to background seqs.
540501

541-
1. Create a `Granges` object for a given annotation category.
542-
2. Remove duplicated intervals (from diff transcript ids) with `reduce`.
543-
3. Retrieve seqeunces using `getSeqs`
544-
4. Create a dataframe containing the count and frequency of each 6mer.
545-
502+
1. Create a `Granges` object for a given annotation category.
503+
2. Remove duplicated intervals (from diff transcript ids) with `reduce`.
504+
3. Retrieve seqeunces using `getSeqs`
505+
4. Create a dataframe containing the count and frequency of each 6mer.
546506

547-
## Calculate 6mers in HuR sites {.smaller}
507+
## Calculate 6mers in HuR sites {.smaller}
548508

549509
Since HuR preferentially binds to 3' UTRs, that is the region we will focus on.
550510

@@ -575,7 +535,7 @@ hur_utr3_6mer <- oligonucleotideFrequency(
575535
as.prob = F,
576536
simplify.as = "matrix"
577537
) |>
578-
colSums(.) |>
538+
colSums() |>
579539
as.data.frame()
580540
581541
colnames(hur_utr3_6mer) <- "hur_utr_count"
@@ -585,10 +545,9 @@ hur_utr3_6mer$hur_utr_freq <- hur_utr3_6mer$hur_utr_count /
585545
sum(hur_utr3_6mer$hur_utr_count)
586546
```
587547

548+
## Calculate 6mers in 3utrs {.smaller}
588549

589-
## Calculate 6mers in 3utrs {.smaller}
590-
591-
Next, we will calculate 6mer frequencies in 3' UTRs. This will serve as a null model or background that we can compare with the HuR binding site 6mers.
550+
Next, we will calculate 6mer frequencies in 3' UTRs. This will serve as a null model or background that we can compare with the HuR binding site 6mers.
592551

593552
```{r}
594553
#| eval: true
@@ -610,7 +569,7 @@ utr3_6mer <- oligonucleotideFrequency(
610569
as.prob = F,
611570
simplify.as = "matrix"
612571
) |>
613-
colSums(.) |>
572+
colSums() |>
614573
as.data.frame()
615574
616575
colnames(utr3_6mer) <- "utr_count"
@@ -621,7 +580,6 @@ utr3_6mer$utr_freq <- utr3_6mer$utr_count / sum(utr3_6mer$utr_count)
621580

622581
## Sequences enriched in hur sites vs 3utr {.smaller}
623582

624-
625583
```{r}
626584
#| eval: true
627585
#| echo: true
@@ -653,11 +611,11 @@ ggplot(
653611
ylab("kmers: HuR 3'UTR sites") +
654612
xlab("kmers: 3'UTR") +
655613
geom_abline(intercept = 0, slope = 1) +
614+
geom_abline(intercept = 0, slope = 8) +
656615
geom_text_repel(aes(label = ifelse(hur_enrich > 8, rownames(utr3_df), ""))) +
657616
theme_cowplot()
658617
```
659618

660-
661619
## Sequences enriched in hur sites vs 3utr {.smaller}
662620

663621
Not just AU-rich...

0 commit comments

Comments
 (0)