fix class 27

nmukherjee · nmukherjee · commit 7e77ed23b26e · 2025-10-12T05:35:18.000-06:00
diff --git a/exercises/ex-27.qmd b/exercises/ex-27.qmd
@@ -524,7 +524,7 @@ hur_utr3_6mer <- ??(
   width = 6, # k
   as.prob = F,
   simplify.as="matrix") %>%
-  colSums(.) %>%
+  colSums() %>%
   as.data.frame()
 
 colnames(hur_utr3_6mer) <- "hur_utr_count"
@@ -555,7 +555,7 @@ utr3_6mer <- ??(
   width = 6,
   as.prob = F,
   simplify.as="matrix") %>%
-  colSums(.) %>%
+  colSums() %>%
   as.data.frame()
 
 colnames(utr3_6mer) <- "utr_count"
diff --git a/slides/slides-27.qmd b/slides/slides-27.qmd
@@ -57,7 +57,7 @@ For example, an RBP might bind to specific sequences in the 3' UTR of an mRNA an
 
 ## Mapping of RBP binding sites: CLIP-seq {.smaller}
 
-::: columns
+::::: columns
 ::: {.column width="50%"}
 -   Covalent cross-linking of RBPs to RNA using 254 nm UV light.
 -   Lyse cells
@@ -73,11 +73,11 @@ For example, an RBP might bind to specific sequences in the 3' UTR of an mRNA an
 ::: {.column width="50%"}
 ![](/img/block-rna/clipseq.jpg)
 :::
-:::
+:::::
 
 ## Mapping of RBP binding sites: PAR-CLIP {.smaller}
 
-::: columns
+::::: columns
 ::: {.column width="50%"}
 **PAR-CLIP**: photoactivatable ribonucleoside enhanced clip
 
@@ -90,14 +90,12 @@ For example, an RBP might bind to specific sequences in the 3' UTR of an mRNA an
 ::: {.column width="50%"}
 ![](/img/block-rna/nihms832576f1.jpg){width="389"}
 :::
-:::
-
+:::::
 
 ## Mapping of RBP binding sites: analysis {.smaller}
 
-::: columns
-::: {.column width="50%" .nonincremental}
-
+::::: columns
+::: {.column .nonincremental width="50%"}
 Most CLIP-seq approaches have single-nucleotide resolution information. However, they vary in the frequency of that information and the efficiency of the procedure.
 
 The basic concept to **call a peak/binding sites** from CLIP-seq:
@@ -107,62 +105,51 @@ The basic concept to **call a peak/binding sites** from CLIP-seq:
 -   Use nucleotide level information to de/refine position of RBP-binding sites
 
 In this class we will be working with PAR-CLIP data. Regardless, I will show you how to access ENCODE eCLIP data. You would easily be able to apply what you learn on those data.
-
 :::
 
 ::: {.column width="50%"}
 ![](/img/block-rna/reads_cliptype.jpg)
 :::
-:::
-
+:::::
 
 ## Analysis overview {.smaller}
 
-::: columns
+::::: columns
 ::: {.column width="50%"}
-
 ![](/img/block-rna/workflow.jpg)
-
 :::
-::: {.column width="50%" .nonincremental}
 
-1. Filter out low quality or short reads (<18 for larger genomes)
-2. Trim adapters
-3. Collapse PCR duplicate reads (molecular indexes)
-4. Align to genome/transcriptome
-5. Call peaks
-6. Downstream analysis
-
-:::
+::: {.column .nonincremental width="50%"}
+1.  Filter out low quality or short reads (\<18 for larger genomes)
+2.  Trim adapters
+3.  Collapse PCR duplicate reads (molecular indexes)
+4.  Align to genome/transcriptome
+5.  Call peaks
+6.  Downstream analysis
 :::
+:::::
 
 ## Pre-processing {.smaller}
 
 [Cutadapt](https://cutadapt.readthedocs.io/en/stable/)
 
-
-
 ![](/img/block-rna/adapters.jpg)
 
-
 ## Calling binding sites: PARalyzer {.smaller}
 
-::: columns
+::::: columns
 ::: {.column width="50%"}
+The pattern of T = \> C conversions, coupled with read density, can thus provide a strong signal to generate a high-resolution map of confident RNA-protein interaction sites.
 
-The pattern of T = > C conversions, coupled with read density, can thus provide a strong signal to generate a high-resolution map of confident RNA-protein interaction sites.
-
-A non-parametric kernel-density estimate used to identify the RNA-protein interaction sites from a combination of T = > C conversions and read density.
+A non-parametric kernel-density estimate used to identify the RNA-protein interaction sites from a combination of T = \> C conversions and read density.
 
 See [PARalyzer](https://pubmed.ncbi.nlm.nih.gov/21851591/) for more information.
-
 :::
-::: {.column width="50%"}
 
+::: {.column width="50%"}
 ![](/img/block-rna/FMRreads.png)
 :::
-:::
-
+:::::
 
 ## Today's menu {.smaller}
 
@@ -174,18 +161,11 @@ We will be starting with position of the binding sites in the genome (the output
 
 #### 2. Perform motif analysis accounting for the background sequence regions.
 
-
 ![](/img/block-rna/workflow_2hallf.jpg)
 
 ## Annotation of binding sites {.smaller}
 
-
-Where are the binding sites?
-- Which genes?
-- What region of those genes?
-- How many binding sites per region?
-- How many binding sites per gene?
-- How many binding sites per gene by region?
+Where are the binding sites? - Which genes? - What region of those genes? - How many binding sites per region? - How many binding sites per gene? - How many binding sites per gene by region?
 
 We will use `annotatr` and `Granges` to answer these questions.
 
@@ -215,8 +195,6 @@ annotations <- build_annotations(genome = "hg19", annotations = my_hg19_annots)
 annotations
 ```
 
-
-
 ## Extract annotation categories {.smaller}
 
 What information is contained within the `annotations` object?
@@ -226,7 +204,7 @@ What information is contained within the `annotations` object?
 #| echo: true
 #| label: annot-explore
 
-my_hg19_annots[3]
+
 
 # get introns
 annotation_introns <- annotations[annotations$type == my_hg19_annots[3]]
@@ -247,13 +225,8 @@ annotation_cds <- GenomicRanges::reduce(annotation_cds)
 length_cds <- width(annotation_cds)
 ```
 
-
-
-
-
 ## Compare introns and cds length {.smaller}
 
-
 ```{r}
 #| eval: true
 #| echo: true
@@ -293,7 +266,7 @@ ggplot(all_length, aes(x = nt, color = cat)) +
   theme_cowplot()
 ```
 
-The typical* human intron is way longer than a CDS exon.
+The typical\* human intron is way longer than a CDS exon.
 
 ## ELAVL1/HuR {.smaller .nonincremental}
 
@@ -303,25 +276,20 @@ HuR binds to AU-rich elements (ARE) in 3’ UTRs of mRNAs to promote mRNA stabil
 
 ![](/img/block-rna/hur_mechanism.png)
 
-
 This model makes a few specific prediction:
 
-1. HuR binds  to the 3' UTR.
-2. HuR binds to AU-rich sequences (AUUUA).
-3. HuR binding promotes target RNA stabilization (and binding by the other RBPs to the ARE promotes destabilization).
+1.  HuR binds to the 3' UTR.
+2.  HuR binds to AU-rich sequences (AUUUA).
+3.  HuR binding promotes target RNA stabilization (and binding by the other RBPs to the ARE promotes destabilization).
 
 We will explore these predictions during the next couple classes.
 
 ## PAR-CLIP data {.smaller}
 
-Reminder that we will be using this resource:
-[rag-tag ENCODE](https://github.com/BIMSBbioinfo/RCAS_meta-analysis)
+Reminder that we will be using this resource: [rag-tag ENCODE](https://github.com/BIMSBbioinfo/RCAS_meta-analysis)
 
 We are looking for an ELAVL1 PAR-CLIP corresponding to this SRA (short-read archive) ID: **SRR248532**
 
-
-
-
 ```{r}
 #| eval: true
 #| echo: true
@@ -344,7 +312,6 @@ hur_regions <- hur_regions[hur_regions$score > 1]
 hur_regions
 ```
 
-
 ## Annotate PAR-CLIP data {.smaller}
 
 ```{r}
@@ -387,7 +354,6 @@ hur_annot$annot.type <- gsub("hg19_genes_", "", hur_annot$annot.type)
 table(hur_annot$annot.type)
 ```
 
-
 ## HuR binding region preference {.smaller}
 
 It looks like HuR prefers binding to 3' UTRs and introns. That is a bit of a surprise given the model above indicating 3' UTR binding. Well let's take a step back and frame our expectation using what we know about the genome.
@@ -398,7 +364,6 @@ In this case, how many basepairs are introns and 3' UTRs in the genome?
 
 ## binding region length biases {.smaller}
 
-
 ```{r}
 #| eval: true
 #| echo: true
@@ -427,14 +392,12 @@ for (i in 1:length(my_hg19_annots)) {
 barplot(mylengths[1:4], las = 2, main = "total bases per category", log = "y")
 ```
 
-
 ## Control for CLIP-binding sites {.smaller}
 
 We need a way to figure out a null model OR background expectation.
 
 What if we were to take our HuR binding and randomize their position and then repeat the annotation on the randomized binding sites?
 
-
 ```{r}
 #| eval: true
 #| echo: true
@@ -511,11 +474,9 @@ ggplot(
 ) +
   geom_bar(stat = "identity") +
   ylab("Observed vs Expected") +
-  theme_cowplot()
+  theme_cowplot() + geom_hline(yintercept = 1)
 ```
 
-
-
 ## 5 MINUTE BREAK
 
 ## What sequence does HuR bind to? {.smaller}
@@ -524,9 +485,9 @@ Is it just *AUUUA*?
 
 **Different transcript regions have different nucleotide composition.**
 
-- 5' UTRs are more GC-rich
+-   5' UTRs are more GC-rich
 
-- 3' UTRs are more AU-rich
+-   3' UTRs are more AU-rich
 
 ![](/img/block-rna/gc.jpg)
 
@@ -538,13 +499,12 @@ Steps to determine k-mer composition (we use 6mers) for any set of intervals
 
 We'll do it for both HuR binding sites and then compare it to background seqs.
 
-1. Create a `Granges` object for a given annotation category.
-2. Remove duplicated intervals (from diff transcript  ids) with `reduce`.
-3. Retrieve seqeunces using  `getSeqs`
-4. Create a dataframe containing the count and frequency of each 6mer.
-
+1.  Create a `Granges` object for a given annotation category.
+2.  Remove duplicated intervals (from diff transcript ids) with `reduce`.
+3.  Retrieve seqeunces using `getSeqs`
+4.  Create a dataframe containing the count and frequency of each 6mer.
 
-## Calculate 6mers in HuR sites   {.smaller}
+## Calculate 6mers in HuR sites {.smaller}
 
 Since HuR preferentially binds to 3' UTRs, that is the region we will focus on.
 
@@ -575,7 +535,7 @@ hur_utr3_6mer <- oligonucleotideFrequency(
   as.prob = F,
   simplify.as = "matrix"
 ) |>
-  colSums(.) |>
+  colSums() |>
   as.data.frame()
 
 colnames(hur_utr3_6mer) <- "hur_utr_count"
@@ -585,10 +545,9 @@ hur_utr3_6mer$hur_utr_freq <- hur_utr3_6mer$hur_utr_count /
   sum(hur_utr3_6mer$hur_utr_count)
 ```
 
+## Calculate 6mers in 3utrs {.smaller}
 
-## Calculate 6mers in 3utrs   {.smaller}
-
-Next, we will calculate 6mer frequencies in 3' UTRs. This will serve as a null model or background that we can compare with the HuR  binding  site 6mers.
+Next, we will calculate 6mer frequencies in 3' UTRs. This will serve as a null model or background that we can compare with the HuR binding site 6mers.
 
 ```{r}
 #| eval: true
@@ -610,7 +569,7 @@ utr3_6mer <- oligonucleotideFrequency(
   as.prob = F,
   simplify.as = "matrix"
 ) |>
-  colSums(.) |>
+  colSums() |>
   as.data.frame()
 
 colnames(utr3_6mer) <- "utr_count"
@@ -621,7 +580,6 @@ utr3_6mer$utr_freq <- utr3_6mer$utr_count / sum(utr3_6mer$utr_count)
 
 ## Sequences enriched in hur sites vs 3utr {.smaller}
 
-
 ```{r}
 #| eval: true
 #| echo: true
@@ -653,11 +611,11 @@ ggplot(
   ylab("kmers: HuR 3'UTR sites") +
   xlab("kmers: 3'UTR") +
   geom_abline(intercept = 0, slope = 1) +
+  geom_abline(intercept = 0, slope = 8) +
   geom_text_repel(aes(label = ifelse(hur_enrich > 8, rownames(utr3_df), ""))) +
   theme_cowplot()
 ```
 
-
 ## Sequences enriched in hur sites vs 3utr {.smaller}
 
 Not just AU-rich...