Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions .github/workflows/quarto.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,15 +23,16 @@ jobs:
run: |
quarto add --no-prompt quarto-ext/fontawesome
quarto add --no-prompt sellorm/quarto-social-embeds
quarto add --no-prompt r-wasm/quarto-drop
# quarto add --no-prompt r-wasm/quarto-drop
quarto add --no-prompt quarto-ext/pointer
quarto add --no-prompt r-wasm/quarto-live
# quarto add --no-prompt r-wasm/quarto-live
quarto add --no-prompt gadenbuie/quarto-auto-dark

- name: 🔧 Install R
uses: r-lib/actions/setup-r@v2
with:
use-public-rspm: true
r-version: 'renv'
r-version: "renv"

- name: 🔁 Install system dependencies
run: |
Expand Down Expand Up @@ -63,10 +64,10 @@ jobs:
if: github.event_name == 'pull_request'
uses: nwtgck/actions-netlify@v3.0
with:
publish-dir: './_site'
publish-dir: "./_site"
production-branch: main
github-token: ${{ secrets.GITHUB_TOKEN }}
deploy-message: 'Deploy from GHA: ${{ github.event.pull_request.title || github.event.head_commit.message }} (${{ github.sha }})'
deploy-message: "Deploy from GHA: ${{ github.event.pull_request.title || github.event.head_commit.message }} (${{ github.sha }})"
enable-commit-comment: false
enable-github-deployment: false
env:
Expand Down
15 changes: 15 additions & 0 deletions _freeze/exercises/ex-19/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"hash": "2736eeed9e0c9e19f4078f987cc5121a",
"result": {
"engine": "knitr",
"markdown": "---\ntitle: \"Chromatin accessibility II\"\nsubtitle: \"Meta-plots and heatmaps\"\nauthor: \"{{< var instructor.block.dna >}}\"\ndate: last-modified\n---\n\n## Genomewide chromatin analysis with meta-plots and heatmaps\n\nLast class we saw what the different methods to profile chromatin\naccessibility can tell us about general chromatin structure and possible regulation\nat specific regions in a small portion of a chromosome.\n\nWe also want to make sure these conclusions are valid throughout the genome.\nSince we want to keep the file sizes small, we will ask if they are valid across\nan entire chromosome.\n\n## Load libraries {.smaller}\n\nFirst we will plot the profiles of all our data sets relative to the\ntranscription start site (TSS), where all the action seems to be happening.\n\n\n::: {.cell}\n\n:::\n\n\n## Load data {.smaller}\n\nFirst, we need to load relevant files:\n\n- `yeast_tss_chrII.bed.gz` contains transcription start sites (TSS) for genes on yeast chromosome 2.\n- `sacCer3.chrom.sizes` contains the sizes of all yeast chromosomes, needed for some of the calculations we'll do. We'll grab this from the UCSC download site.\n\n`read_bed()` and `read_genom()` are valr functions.\n\n\n::: {.cell}\n\n:::\n\n\n## Load signals {.smaller}\n\nNext we'll load bigWigs for the ATAC and MNase experiments, containing either short or long fragments.\n\nRecall that the information encoded in short and long fragments should be reflected in our interpretations.\n\nFirst, we make a tibble of file paths and metadata.\n\n::: {.cell}\n\n:::\n\n\n---\n\nNext, we need to read in the bigWig files. We use `purrr::map` to apply `read_bigwig()`\nto each of the bigWig files, and store the results in a new column called `big_wig`.\n\n\n::: {.cell}\n\n:::\n\n\n# Meta-plots\n\n## WHY meta-plots and heatmaps?\n\nMeta-plots and heatmaps are useful for visualizing patterns of signal across many\nloci at the same time.\n\nThis is particularly useful for chromatin data, where we often want to\nunderstand how chromatin structure varies across many genes or regulatory elements\nthat share a common reference point, like transcription start sites (TSS) or\nnucleosomal midpoints.\n\n## Setting up regions for a meta-plot\n\nNext, we need to set up some windows for analyzing signal relative to each TSS.\nThis is an important step that will ultimately impact our interpretations.\n\nIn genomic meta-plots, you first decide on a window size relevant to the\nfeatures you are measuring, and then make \"windows\" around a reference point,\nspanning some distance both up- and downstream. If the features involve gene\nfeatures, we also need to take strand into account.\n\n## Setting up regions for a meta-plot\n\nReference points could be:\n\n- transcription start or end sites\n- boundaries of exons and introns\n- enhancers\n- centromeres and telomeres\n\n## Setting up regions for a meta-plot\n\nThe window size should be relevant the reference points, such that small- or\nlarge-scale features are emphasized in the plot. Moreover, the window typically\nspans some distance both up- and downstream of the reference points.\n\n## Setting up regions for a meta-plot\n\nOnce the window size has been decided, the next step is to make \"sub-windows\"\naround a reference point. If gene features are involved, we also need to take\nstrand into account.\n\n## Setting up regions for a meta-plot\n\nFor small features like transcription factor binding sites (8-20 bp), you might\nset up smaller windows (maybe 1 bp) at a distance \\~20 bp up- and downstream of\na reference point.\n\nFor larger features like nucleosome positions or chromatin domains, you might\nset up larger windows (\\~200 bp) at distances up to \\~10 kbp up- and downstream\nof a set of reference points.\n\n## Metaplot workflow\n\n![Metaplot workflow overview](../img/block-dna/metaplot-workflow.png)\n\n## Chromatin accessibility around transcription start sites (TSSs) {.smaller}\n\n\n::: {.cell output-location='column'}\n\n:::\n\n\n## Chromatin accessibility around transcription start sites (TSSs) {.smaller}\n\nNext, we'll use two valr functions to expand the window of the reference\npoint (`bed_slop()`) and then break those windows into evenly spaced intervals\n(`bed_makewindows()`).\n\n\n::: {.cell output-location='column'}\n\n:::\n\n\n## Chromatin accessibility around transcription start sites (TSSs)\n\nAt this point, we also address the fact that the TSS data are stranded. Can someone describe what the issue is here, based on the figure above?\n\n\n::: {.cell}\n\n:::\n\n\n## Chromatin accessibility around transcription start sites (TSSs) {.smaller}\n\nThis next step uses valr `bed_map()`, to calculate the total signal for each\nwindow by intersecting signals from the bigWig files.\n\n\n::: {.cell output-location='column'}\n\n:::\n\n\n## Chromatin accessibility around transcription start sites (TSSs) {.smaller}\n\nOnce we have the values from `bed_map()`, we can group by `win_coord` and\ncalculate a summary statistic for each window.\n\nRemember that `win_coord` is the same relative position for each TSS, so these\nnumbers represent a composite signal a the same position across all TSS.\n\n\n::: {.cell output-location='column'}\n\n:::\n\n\n## Meta-plot of signals around TSSs {.smaller}\n\nFinally, let's plot the data relative to TSS for each of the windows.\n\n\n::: {.cell output-location='slide'}\n\n:::\n\n\n## Interpreting the meta-plots\n\n- What is the direction of transcription in these meta-plots?\n\n- What are the features of chromatin near TSS measured by these different experimental conditions?\n\n- How do you interpret the increased signal of the +1 nucleosome in the \"MNase_Long\" condition, relative to e.g. -1, +2, +3, etc.?\n\n- What are the differences in ATAC and MNase treatments that lead to these distinctive patterns?\n\n# Heatmaps\n\n## Heatmap of signals around TSSs\n\nTo generate a heatmap, we need to reformat our data slightly.\n\nTake a look at `acc_tbl` and think about how you might reorganize the following way:\n\n- rows contain the data for individual loci (i.e., each TSS)\n- columns are ordered positions relative to the TSS (i.e., most upstream to most downstream)\n\n## Heatmap of signals around TSSs {.smaller}\n\nWe're going to plot a heatmap of the \"MNase_Long\" data. There are two ways\nto get these data\n\n\n::: {.cell}\n\n:::\n\n\n## Or, using dplyr / tidyr: {.smaller}\n\n\n::: {.cell output-location='column'}\n\n:::\n\n\n## Heatmap of signals around TSSs {.smaller}\n\nEither way, now we need to reformat the data.\n\n\n::: {.cell output-location='slide'}\n\n:::\n\n\n## Heatmap of signals around TSSs\n\nOnce we have the data reformatted, we just convert to a matrix and feed it to\n`ComplexHeatmap::Heatmap()`.\n\n\n::: {.cell output-location='slide'}\n\n:::\n\n\n## Interpreting meta-plots and heatmaps\n\nIt's worth considering what meta-plots and heatmaps *can* and *can't* tell you.\n\n1. What are the similarities and differences between heatmaps and meta-plots?\n\n2. What types of conclusions can you draw from each type of plot?\n\n3. What are some features of MNase-seq and ATAC-seq that become more clear when\nlooking across many loci at the same time?\n\n4. What are some hypotheses you can generate based on these plots?\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}
15 changes: 15 additions & 0 deletions _freeze/exercises/ex-20/execute-results/html.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"hash": "e5211f049067d5b8a126a6577c2baccd",
"result": {
"engine": "knitr",
"markdown": "---\ntitle: \"Where do proteins bind in the genome?\"\nauthor: \"{{< var instructor.block.dna >}}\"\ndate: last-modified\n---\n\n## What to map and how to map it?\n\n::: columns\n::: {.column width=\"50%\"}\n**Targets**\n\n- Transcription factors\n- Histone modifications\n- Chromatin remodelers\n- RNA polymerases\n- Other factors that bind chromatin\n:::\n\n::: {.column width=\"50%\"}\n**Methods**\n\n- ChIP-seq\n- MNase-ChIP-seq\n- CUT&RUN\n- CUT&TAG\n:::\n:::\n\n##\n\n![](../img/block-dna/ChIPseq_2.png){fig-align=\"center\"}\n\n##\n\n![](../img/block-dna/ChIP_Data.png){fig-align=\"center\"}\n\n##\n\n![](../img/block-dna/cut-and-run.png){fig-align=\"center\"}\n\n##\n\n![](../img/block-dna/chip-resolution-comparison.png) {fig-align=\"center\"}\n\n## Comparison of factor-centric methods {.smaller}\n\n::: columns\n::: {.column width=\"50%\"}\n![](../img/block-dna/chip-comparison-overview.png)\n:::\n\n::: {.column width=\"50%\"}\n| Method | Resolution | Sequencing cost |\n|:--------------:|:----------:|:---------------:|\n| ChIP-seq | Low | High |\n| MNase-ChIP-seq | High | High |\n| CUT&RUN | High | Low |\n:::\n:::\n\n# Workflow\n\n### FASTQ files\n\n- Adapter trimming\n\n- Aligning to the genome\n\n### Bed files\n\n- Generate read density genome-wide\n\n### Read density (wig/bedgraph)\n\n- Call peaks\n\n- Meta analysis\n\n- Identify motifs\n\n- Compare perturbations to control, compare to other datasets\n\n## Example data: CTCF CUT&RUN in K562 cells\n\n![](../img/block-dna/ctcf_cut_run_track.png){fig-align=\"center\" width=\"800\"}\n\n## Example data: CTCF CUT&RUN in K562 cells\n\n![](../img/block-dna/ctcf_cut_run_meta.png){fig-align=\"center\" width=\"106\"}\n\n(from Skene and Henikoff, eLIFE 2017)\n\n## Where do transcription factors bind in the genome?\n\nToday we'll look at where two yeast transcription factors bind in the genome using CUT&RUN.\n\n## Where do transcription factors bind in the genome? {.smaller}\n\nTechniques like CUT&RUN require an affinity reagent (e.g., an antibody) that uniquely recognizes a transcription factor in the cell.\n\n1. Antibody is added to permeabilized cells, and the antibody associates with the epitope.\n1. A separate reagent, a fusion of Protein A (which binds IgG) and micrococcal nuclease (MNase) then associates with the antibody.\n1. Addition of calcium activates MNase, and nearby DNA is digested.\n1. These DNA fragments are then isolated and sequenced to identify sites of TF association in the genome.\n\n## Where do transcription factors bind in the genome?\n\n![Fig 1a, Skene et al.](../img/block-dna/skene-fig-1a.png)\n\n## Data download and pre-processing {.smaller}\n\nCUT&RUN data were downloaded from the [NCBI GEO page](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE84474) for Skene et al.\n\nI selected the 16 second time point for *S. cerevisiae* Abf1 and Reb1 (note the paper combined data from the 1-32 second time points).\n\nBED files containing mapped DNA fragments were separated by size and converted to bigWig with:\n\n``` bash\n# separate fragments by size\nawk '($3 - $2 <= 120)' Abf1.bed > CutRun_Abf1_lt120.bed\nawk '($3 - $2 => 150)' Abf1.bed > CutRun_Abf1_gt150.bed\n\n# for each file with the different sizes\nbedtools genomecov -i Abf1.bed -g sacCer3.chrom.sizes -bg > Abf1.bg\nbedGraphToBigWig Abf1.bg sacCer3.chrom.sizes Abf1.bw\n```\n\nThe bigWig files are available here in the `data/` directory.\n\n## Questions\n\n1. How do you ensure your antibody recognizes what you think it recognizes? What are important controls for ensuring it recognizes a specific epitope?\n\n2. What are some good controls for CUT&RUN experiments?\n\n# CUT&RUN analysis\n\n## Set up libraries\n\n\n::: {.cell}\n\n:::\n\n\n## Examine genome coverage\n\n\n::: {.cell}\n\n:::\n\n\n\n## Examine genome coverage {.smaller}\n\n\n::: {.cell}\n\n:::\n\n\n## Examine genome coverage\n\n\n::: {.cell}\n\n:::\n\n\n## Examine genome coverage\nNow that we have tracks loaded, we can make a plot.\n\n\n::: {.cell output-location='slide'}\n\n:::\n\n\n## Questions\n\n1. What features stand out in the above tracks? What is different between Reb1 and Abf1? Between the short and long fragments?\n\n2. Where are the major signals with respect to genes?\n\n## Peak calling\n\nA conceptually simple approach to identification of regions containing \"peaks\" where a transcription factor was bound is available in the MACS software ([paper](), [github]()). There's also a nice [blog post](https://hbctraining.github.io/Intro-to-ChIPseq/lessons/05_peak_calling_macs.html) covering the main ideas.\n\n## Theory\n\nThe Poisson distribution is a discrete probability distribution of the form:\n\n$$ P_\\lambda (X=k) = \\frac{ \\lambda^k }{ k! * e^{-\\lambda} } $$\n\nwhere $\\lambda$ captures both the mean and variance of the distribution.\n\nThe R functions `dpois()`, `ppois()`, and `rpois()` provide access to the density, distribution, and random generation for the Poisson distribution.\n\nLook over the `?dpois` documentation.\n\n## Theory\n\n\n::: {.cell}\n\n:::\n\n\n## Practice\n\nHere, we model read coverage using the Poisson distribution. Given some genome size $G$ and and a number of reads collected $N$, we can approximate $\\lambda$ from $N/G$.\n\nMACS uses this value (the \"genomewide\" lambda) and also calculates several \"local\" lambda values to account for variation among genomic regions. We'll just use the genomewide lambda, which provides a conservative threshold for peak calling.\n\nUsing the genomewide lambda, we can use the Poisson distribution to address the question: **How surprised should I be to see** $k$ reads at position X?\n\n---\n\n\n::: {.cell}\n\n:::\n\n\n## P-values\n\nLet's take a look at a plot of the p-value across a chromosome. What do you notice about this plot, when compared to the coverage of the CUT&RUN coverage above?\n\n\n::: {.cell output-location='slide'}\n\n:::\n\n\n## Peaks\n\nHow many peaks are called in this region?\n\n\n::: {.cell output-location='slide'}\n\n:::\n\n\n## Visualize\n\nLet's visualize these peaks in the context of genomic CUT&RUN signal. We need to define an `AnnotationTrack` with the peak intervals, which we can plot against the CUT&RUN coverage we defined above.\n\nLet us load the data:\n\n\n::: {.cell output-location='slide'}\n\n:::\n\n\n## Visualize\n\nAnd plot:\n\n\n::: {.cell output-location='slide'}\n\n:::\n\n\n## Questions\n\n1. How many peaks were called throughout the genome? How wide are the called peaks, on average?\n\n2. How else might we define a significance threshold for identifying peaks?\n\n3. What might the relative heights of the peaks indicate? What types of technical or biological variables might influence peak heights?\n",
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
"includes": {},
"engineDependencies": {},
"preserve": {},
"postProcess": true
}
}
Loading
Loading