Cluster stability #1160
Replies: 1 comment
-
There shouldnt be a change in the cell order if that is what you are asking. But it probably would be better practice to obtain the cluster labels from
Maybe I am not 100% sure what you mean by cluster stability. It would seem to me that if you want to test the stability of cluster calls, then you would test that on the actual cluster calls. Otherwise you are testing the stability of the actual integration?
If I understood what you really mean by cluster stability, then I might have more to say. When I hear cluster stability, I think "how stable is this cluster call given different settings for |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've been trying to run a cluster stability analysis (SVM in python) on integrated snATACseq and snRNAseq data generated in ArchR. This requires a cell type x gene matrix and a list of cell IDs as input.
I have extracted the necessary info using the following:
However, when running the cluster stability analysis the results were similar to that when cluster labels are randomly assigned. This is dispite all other metrics for the data looking great which makes me suspect there is a discrepancy with the cell label assignment above.
Could I ask:
A: Is it valid to assign the cell cluster labels in the colData to the cell IDs stored in the GeneExpressionMatrix in the manner I have or is there some sort of indexing going on between the colData and the GeneExpressionMatrix, meaning I've essentially randomised the cell labelling here?
B: More broadly with regard to cluster stability, it's not entirely clear to me at which point in the ArchR data processing process it would be best to test this. Could you offer any insight on this?
I guess, cell assignment based on RNA integration is still heavily dependent on the quality of the initial ATACseq data. So perhaps there is an argument to be had that testing cluster stability before integration on the geneScoreMatrix data would be the best option. (I wonder if this is would even possible given the binary nature of snATACseq data.) Yet, as clusters are often combined during integration and it is likely we will only report cell assignments based on RNA cell ID mappings post-integration, I thought testing the GeneIntegrationMatrix data would be better. When I've tested cluster stability before on RNA-seq data it has been fairly straightforward. Working with integrated data convolutes the story a bit.
UPDATE: I get the similar results testing this pre- and post-integration albeit the cell IDs are different at each stage.
C: Will a test for cluster stability be something you would consider adding to the pipeline?
Beta Was this translation helpful? Give feedback.
All reactions