Are normalized data summed when creating the RNA group Matrix within addPeak2GeneLinks? #1943
Unanswered
RegnerM2015
asked this question in
Questions / Documentation
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @rcorces and ArchR users,
Creating aggregates or metacells is a critical step in many of the integrative analyses performed in ArchR. To my understanding, this creation of aggregates is performed at these lines of ArchR/R/IntegrativeAnalysis.R:
ArchR/R/IntegrativeAnalysis.R
Lines 1107 to 1118 in f6c0388
Using the group memberships defined by the knnObj, information is summed across single-cells within each group or metacell. For RNA, "GeneIntegrationMatrix" is used by default while for ATAC, "PeakMatrix" is used when invoking
addPeak2GeneLinks
.The "PeakMatrix" stores integer counts for peaks across single cells, while the "GeneIntegrationMatrix" is already log-normalized and scaled so that each cell's information content sums to 10000. Please let me know if this interpretation is inaccurate.
Based on these observations, my interpretation is that integer counts are summed for each metacell in ATAC while the normalized data are summed for each metacell in RNA to create the group matrices for ATAC and RNA respectively.
After that, a counts per 10,000 normalization is applied to each group matrix followed by a log2 transformation:
ArchR/R/IntegrativeAnalysis.R
Lines 1135 to 1141 in f6c0388
Is this the intended operation for creating metacells in the RNA modality? I ask to improve my understanding of the underlying methods used in ArchR and similar tools.
While one could use the
predictedCell
column to map back to the raw integer counts in the Seurat object used inaddGeneIntegrationMatrix
, often times single cells in ATAC only map to a small fraction of cells in RNA. If raw RNA counts are used for metacell creation, this may lead to highly redundant gene expression profiles and may induce low variance in gene expression across metacells. Thus, using the "GeneIntegrationMatrix" retains more variation as each single cell in ATAC receives an inferred gene expression profile based on its nearest neighboring cell in RNA and this profile is adjusted accordingly based on the scRNA-scATAC anchor weights matrix (https://github.com/satijalab/seurat/blob/b56d194939379460db23380426d3896b54d91ab6/R/integration.R#L1540-L1572).Do you concur with this reasoning? Thank you for your help!
Beta Was this translation helpful? Give feedback.
All reactions