Cell aggregation for co-accessibility score calculation #981
-
Hi, I have a question regarding the cell aggregation prior to co-accessibility score calculation (also relevant for Peak2GeneLinkage). If I am not mistaken, you aggregate 100 cells in 500 groups as a default. The original implementation in Cicero only combines 50 cells per aggregate. Is there a reason why you increased this number? Additionally, I am a bit worried with the "duplication" of cells in multiple aggregates/groups. If one does not adapt the aggregation settings (and does not have a dataset with > 50 000 cells), many cells will likely be "drawn" multiple times during aggregation. In the original Cicero publication, Pliner et al. discuss that "groups will sometimes contain some of the same cells, which could in principle inflate co-accessibility scores across cells." In their analysis, they kept the median number of cells shared between pairs of groups to zero. Since you don't discuss this issue in your manual, what is your take on the "duplication" of cells in multiple aggregates and their consequent inflation of correlation coefficients? Can we trust correlation of identical cells in aggregates? I see increasing numbers of co-accessible links with increasing "cell duplication rates" during aggregation. I adapted your function to draw cells only once for aggregation. This reduced the number of detected links and their strengths, but I still find 3 links per peak on average (vs. 20 links per peak for cell duplication rates of 10). I feel a bit more comfortable with the correlation of accessibility in these independent cell aggregates. What is your take on this? Thank you very much in advance for your answer. Best, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hi Isabelle,
The same caveats apply. If your dataset is small you should adjust the default parameters as there isnt a one-size-fits-all solution. However, I believe we provide the ability to adjust all of the necessary parameters in both |
Beta Was this translation helpful? Give feedback.
Hi Isabelle,
I think your interpretations are all correct. I believe we increased the size of the aggregates to 100 because we found this to work better, especially in larger datasets. But you can change this as you wish.
The same caveats apply. If your dataset is small you should adjust the default parameters as there isnt a one-size-fits-all solution. Ho…