Do low-overlapping aggregates of cells violate statistical assumptions of Pearson correlation? #1720
Unanswered
RegnerM2015
asked this question in
Questions / Documentation
Replies: 1 comment 1 reply
-
All of this sounds right to me. The problem is that with smaller datasets, you essentially cannot avoid overlap. So rather than saying "you cant do peak-to-gene links unless you have X cells"the approach that we've taken is to attempt to minimize that overlap. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @rcorces and @jeffmgranja ,
Based on the documentation, ArchR uses a similar strategy to Cicero to create low-overlapping aggregates of similar cells (metacells) to circumvent the sparsity in scATAC measurements. Metacells with more than 80% overlap with another metacell(s) are filtered out to reduce bias. Therefore, there are likely some metacells that still share cells (in other words, one scATAC-seq cell could be present in multiple metacells), meaning that the observations are not technically independent of one another.
One of the assumptions of Pearson correlation (used in
addPeak2GeneLinks
) is independence of observations (https://libguides.library.kent.edu/spss/pearsoncorr). To my understanding, the assumption is that observations can only be counted once.Since we have overlapping metacells, would this technically violate one of the assumptions of Pearson correlation? Or does the 80% overlap filtering step address/dampen this concern?
Beta Was this translation helpful? Give feedback.
All reactions