treatment vs control pairwise analysis ArchR #696
-
Hello! I'd like to ask if there are any field best practices associated with pseudobulk profile generation and peak calling when looking at differences in chromatin accessibility in a celltype across conditions. For example, the ArchR manual states that "ArchR makes multiple such pseudo-bulk samples for each desired cell grouping, hence the term pseudo-bulk replicates. The underlying assumption in this process is that the single cells that are being grouped together are sufficiently similar that we do not care to understand the differences between them." Now, pseudobulk profile generation is done on a per sample basis. So, as long as you're ensuring your profiles are done on a per sample basis by specifying replicates and min cells, you should be okay here to generate based on cluster, peak calling becomes the next issue. So peak calling is done on the pseudobulk replicates, using MACS2. The iterative overlap method is where this becomes confusing for me. It's unclear to me what one should consider when adding the reproducible peak set. The manual only explains the iterative overlap peak method using cell types. If I group by sample, how does this differ? Would love some insight from anyone doing this analysis what has worked for them, im sitting at an epistemological struggle with my data that I do see some differences, but my inexperience with this type of analysis leads me to skepticism becuase I've performed it so many different ways and gotten different results. I've been working on this data for about two months now and am at a loss, so ant dialogue with others would be incredible for me. Also, to the ArchR creators, thanks so much for this package. It's been the easiest to use atac-seq workflow for me by far and i've not found a better resource to understand both how atac-seq is performed and downstream analysis that is possible with atac-seq data. You've done amazing work! Thanks, Casey |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 8 replies
-
Thanks for your kind words about the package and documentation. Always nice to hear that it is working for people since most of our feedback comes in the form of bug reports.
Can you clarify this a bit more? In case the documentation isnt clear, there is a difference between making pseudobulk replicates on a per-sample basis and "grouping by sample". If you group based on cluster (or cell type) ArchR tries to prevent you from making multiple pseudobulk replicates that fail to capture biological variability. For example, if you took all cells in a cluster and divided them into three equal-sized but randomly-selected groups, you would have multiple biological donors per pseudobulk replicate and this would obscure biological variability. Instead, ArchR attempts to create pseudobulk replicates that contain cells from only a specific sample. This is what we refer to as sample-aware. But the grouping is still being performed on a cluster. Does that clarify things? |
Beta Was this translation helpful? Give feedback.
-
Hi @rcorces thanks again for all your help. I actually think I figured out the problem with my analysis and I'd like to get your feedback if you wouldn't mind. In my data, it's suspected that Cell Type A- treatment has much more open chromatin than Cell Type A-control. When we look at the nFrags summary, on average CellType A-treatment has double the amount of frags compared to CellType A-control per cell, and this is the only cell type in the dataset this occurs in. When I'm doing a pairwise analysis between these groups using getmarkerfeatures, and include nFrags as a bias, I'm assuming that this is significantly confounding the results when comparing between control and treatment. Do you think just using TSSEnrichment as a bias for this analysis would be sufficient, or would removing the log10(nFrags) metric be a flawed comparison? |
Beta Was this translation helpful? Give feedback.
-
Hi, I've been reading this thread while researching what to plot from my archR object to show global differences in open chromatin between samples of two conditions (WT and TP53-mutated). I plotted nFrags with the following code:
Which generated the following plot: What can I conclude from this plot? That: Thank you so much in advance, |
Beta Was this translation helpful? Give feedback.
Thanks for your kind words about the package and documentation. Always nice to hear that it is working for people since most of our feedback comes in the form of bug reports.
Can you clarify this a bit more? In case the documentation isnt clear, there is a difference between making pseudobulk replicates on a per-sample basis and "grouping by sample". If you group based on cluster (or cell type) ArchR tries to prevent you from making multiple pseudobulk replicates that fail to capture bi…