With 10X multiome data, getting severe gene dropout for DAG markers compared to DEGs for specific clusters #1998

johain1 · 2023-07-27T15:25:09Z

johain1
Jul 27, 2023

With our integrated RNA/ATAC data and clustering, we generated hundreds of DEGs/marker genes based on gene expression for each cluster. Upon switching to ATAC analyses and using Gene Activity Score as in section 7.3 of the ArchRProject, we generated less differentially accessible gene markers, as expected for ATAC, but still generated many genes in each cluster (100+) according to the same threshold of log2FC>1.25 and pvalue<.05, and even generated ~40-50 for our clusters with very few cells; additionally, each cluster had expected canonical marker genes. However, for 4/17 clusters, we noticed a severe gene drop out with very few genes passing the threshold, where these clusters (all subclusters of a certain cell type) had very few genes picked as markers (9, 17, 17, 35). Of the picked markers, they are cluster-specific, but none of them are expected canonical marker genes and are often miRNAs. We thought that perhaps since this cell type is a progenitor of other cell types, it might resemble the other clusters too strongly and thus is unable to get any genes above the threshold. However, we had zero problem with this for RNA DEGs. One of these clusters also has many thousands of cells, it's our biggest cluster, and yet only generated 17 marker genes, none of which were canonical marker genes. However, when you manually plot the gene activity score for expected markers, they are as expected. We also thought that maybe ATAC is getting confused by the different subclusters, so we merged them into one cluster for the cell type, and the same thing happened - we only got 29 genes this time for a huge cluster and none of them were canonical markers or even typical genes- a lot of them were miRNAs as usual. We also considered whether there may be doublets containing this cell type spread out across the other clusters, but we are not sure what is going on. Perhaps we need to adjust the threshold?

Apologies for the long post- please let me know if anyone has any ideas why this might be happening!
Thanks so much.

johain1 · 2023-07-28T16:38:05Z

johain1
Jul 28, 2023
Author

I wanted to add to this that upon looking at all genes, even those below the 1.25 Log2FC cutoff, we found our subcluster specific marker genes far below the threshold. For instance, in one of our proliferative clusters, proliferative markers MKI67 is only assigned 0.57 Log2FC, whereas TOP2A is assigned a Log2FC of 0.15, when they are clearly very highly accessible in this cluster relative to other clusters when looking at the UMAP - does anyone know why this might be happening?

Edit: When looking at MKI67 as an example, the trend of accessibility is correct - the log2FC is highest in the proliferative cluster, and only clusters with visible accessibility on UMAP have a log2FC value. The max value is only 0.57- any reason why it would be so low even though this cluster shows highly specific accessibility?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

With 10X multiome data, getting severe gene dropout for DAG markers compared to DEGs for specific clusters #1998

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

With 10X multiome data, getting severe gene dropout for DAG markers compared to DEGs for specific clusters #1998

Uh oh!

Uh oh!

johain1 Jul 27, 2023

Replies: 1 comment

Uh oh!

Uh oh!

johain1 Jul 28, 2023 Author

johain1
Jul 27, 2023

johain1
Jul 28, 2023
Author