peak2gene links parameters and reproducibility #1728
Replies: 3 comments
-
I dont have much advice to provide here as you're doing something that I have never done. Peak 2 gene links requires variability to drive the correlations. So it really doesnt make sense to perform peak2gene link identification separately in order to identify sample-specific links. In fact, you may lose sample specific links by doing this because you squash variation. It also sounds like you dont have very many cells in your dataset. |
Beta Was this translation helpful? Give feedback.
-
Hi Ryan,
Thank you for the quick reply. I'm not aiming to do sample wise analysis.
The reason why I am doing so is simply because we have a strong batch
effect. Though it can be eliminated by Harmony on projections, the raw data
is not changed. Do you have any suggestions to remove these batch factors
from the raw data, so that I can merge my data, and run peak2gene analysis?
The reason why I struggled a lot with peak2gene link analysis is because it
is important for us in two reasons. One, it can give a list of candidate
enhancers. Second, it is the basis of regulatory network, as we really want
to get the potential TF-target pairs using "peak regions" as a bridge.
And yes, we don't have a big sample size. It also limited the analysis. Is
there a suggested sample size big enough for this analysis?
Thanks again!
…On Tue, 8 Nov 2022 at 14:31, Ryan Corces ***@***.***> wrote:
I dont have much advice to provide here as you're doing something that I
have never done. Peak 2 gene links requires variability to drive the
correlations. So it really doesnt make sense to perform peak2gene link
identification separately in order to identify sample-specific links. In
fact, you may lose sample specific links by doing this because you squash
variation. It also sounds like you dont have very many cells in your
dataset.
—
Reply to this email directly, view it on GitHub
<#1728 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJVYTVXXPVKADIT4HAPYDILWHJI4VANCNFSM6AAAAAAR2EMD6Y>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
You dont need to adjust the raw data (nor is that really possible). Just use the harmony reduced dimensions and this will limit the contribution of batch effect. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Thank you very much for the nice package!
I would like to reconstruct regulatory network, thus use peak2gene links as the basis to identify the correlated region and target gene pairs.
As our data had 2 continuous stages, and we saw batch bias. We independently run the analysis for each stage. Surprisingly, the peak2gene links we identified has very little overlap if we use the term to compare: peakName + positive/negative correlation + geneName. While the overlap is improved if we only checked peakName, but still less than 50%. (which I saw >50% overlap in the cortex paper with scATAC only and scMultiome data)
Do you have any suggestions or comment why the links can be so different from very similar embryonic stages? And is it a way to optimise, so that the result of peak2gene links can be more reproducible? I'm now using the default parameters, except changed impute=F. I tuned k as a test run, when k is lower the number of links decreased, in general. So that when cell number n>=200, I kept with k=100.
One more question, my sample size is ~3k-6k cells. I had fine clustering, so that can get a better resolution for subtype, sample size would be 200-1k. Would you recommend to do the analysis with overall population, or subtype respectively? We are more interested in subtype network, but also aware that if reduce sample size will reduce variability, thus lose detective power.
Looking forward to the reply & thank you in advance!
Beta Was this translation helpful? Give feedback.
All reactions