Cluster stability #1160

Dazcam · 2021-11-15T11:48:14Z

Dazcam
Nov 15, 2021

I've been trying to run a cluster stability analysis (SVM in python) on integrated snATACseq and snRNAseq data generated in ArchR. This requires a cell type x gene matrix and a list of cell IDs as input.

I have extracted the necessary info using the following:

# Access matrix
expression_matrix_obj <- getMatrixFromProject(archR, "GeneIntegrationMatrix")
 
# Extract counts
counts <- expression_matrix_obj@assays@data$GeneIntegrationMatrix

# Add gene names
rownames(counts) <- expression_matrix_obj@elementMetadata$name

# Transpose count matrix
counts_transposed <- as.data.frame(t(as.matrix(counts))) # Note these are normalised values in ArchR

# Extract cluster labels
cluster_labels <- as.data.frame(as.vector(archR$Clusters))

However, when running the cluster stability analysis the results were similar to that when cluster labels are randomly assigned. This is dispite all other metrics for the data looking great which makes me suspect there is a discrepancy with the cell label assignment above.

Could I ask:

A: Is it valid to assign the cell cluster labels in the colData to the cell IDs stored in the GeneExpressionMatrix in the manner I have or is there some sort of indexing going on between the colData and the GeneExpressionMatrix, meaning I've essentially randomised the cell labelling here?

B: More broadly with regard to cluster stability, it's not entirely clear to me at which point in the ArchR data processing process it would be best to test this. Could you offer any insight on this?

I guess, cell assignment based on RNA integration is still heavily dependent on the quality of the initial ATACseq data. So perhaps there is an argument to be had that testing cluster stability before integration on the geneScoreMatrix data would be the best option. (I wonder if this is would even possible given the binary nature of snATACseq data.) Yet, as clusters are often combined during integration and it is likely we will only report cell assignments based on RNA cell ID mappings post-integration, I thought testing the GeneIntegrationMatrix data would be better. When I've tested cluster stability before on RNA-seq data it has been fairly straightforward. Working with integrated data convolutes the story a bit.

UPDATE: I get the similar results testing this pre- and post-integration albeit the cell IDs are different at each stage.

C: Will a test for cluster stability be something you would consider adding to the pipeline?

rcorces · 2021-11-15T17:08:03Z

rcorces
Nov 15, 2021
Maintainer

A: Is it valid to assign the cell cluster labels in the colData to the cell IDs stored in the GeneExpressionMatrix in the manner I have or is there some sort of indexing going on between the colData and the GeneExpressionMatrix, meaning I've essentially randomised the cell labelling here?

There shouldnt be a change in the cell order if that is what you are asking. But it probably would be better practice to obtain the cluster labels from colData() of the SummarizedExperiment object instead. ie colData(expression_matrix_obj )$Clusters in your case.

B: More broadly with regard to cluster stability, it's not entirely clear to me at which point in the ArchR data processing process it would be best to test this. Could you offer any insight on this?

Maybe I am not 100% sure what you mean by cluster stability. It would seem to me that if you want to test the stability of cluster calls, then you would test that on the actual cluster calls. Otherwise you are testing the stability of the actual integration?

C: Will a test for cluster stability be something you would consider adding to the pipeline?

If I understood what you really mean by cluster stability, then I might have more to say. When I hear cluster stability, I think "how stable is this cluster call given different settings for addClusters()?". That being said, we are working on other aspects of ArchR at the moment.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cluster stability #1160

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Cluster stability #1160

Uh oh!

Uh oh!

Dazcam Nov 15, 2021

Replies: 1 comment

Uh oh!

rcorces Nov 15, 2021 Maintainer

Dazcam
Nov 15, 2021

rcorces
Nov 15, 2021
Maintainer