Creating Pseudobulk Replicates for Differential Analysis #1992

brindhaprath · 2023-07-18T12:41:29Z

brindhaprath
Jul 18, 2023

Hello,

First off, to the ArchR creators/moderators - thank you for creating and maintaining the ArchR pipeline! I've been learning about it and using it to analyze my data for the past month, and it has been very helpful & user-friendly. The question I have is regarding ArchR's process for creating pseudo-bulk replicates for peak calling & differential peak analysis.

My ArchR project contains 4 samples, made up of neural progenitor cells from 4 individuals: 1 control, and 3 affected persons (each with different disease states). My overall goal for analysis in ArchR is to compare control vs. disease states in terms of marker genes, marker peaks, and TF enrichment, with hypothesis generation in mind. Each of the four samples are distinct, and cannot be considered replicates of any one condition. My concern is that, when pseudo-bulk replicates are created, single cells from different samples will be grouped together.

I've read the full ArchR manual in detail, particularly this page; it is my understanding that, for a given cluster, if there are not enough cells in one sample, cells will be combined from multiple samples in a sample-agnostic manner. Due to the sample x cluster breakdown of my dataset (see table below), I believe that the mixing of samples in pseudo-bulk replicates is inevitable.

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15

D149 523 0 1 0 10 0 368 2136 1682 9981 0 0 3 0 0

LS002 671 2 2 1189 7196 587 8 0 9 12 0 1 13 7 5

LS003 171 35 1610 0 6 0 706 0 4 1 317 130 59 4865 9658

LS004 88 14 16 0 2 0 140 2 2 0 0 2617 14926 8 24

Will this affect my ability to do differential comparison, e.g. disease vs. control, in downstream analyses? I've read through your suggestions in Discussion #696, #1093, and #1272, which has provided some useful options. For example, I have created a new column in cellColData to represent the product of cluster and condition for each cell. However, I'm not sure how to use this in creating pseudo-bulk replicates or performing peak calling. Would it make sense in this case to pass "Sample" instead of "Clusters" to the groupBy parameter, if I am more interested in sample comparison than differentiating between clusters?

I know this question is more about the adaptation of ArchR analysis to my unique dataset, but any insight/feedback you have would be appreciated!

Thanks,
Brindha

Answered by rcorces

Jul 19, 2023

Your clusters are extremely sample-specific so I dont think that making cluster-level pseudobulks is very helpful for you. I cant say what the right approach is for your analysis but if you're just looking for differential testing, then the pseudobulks arent actually used. in stead, you use a column (that you could create) in cellColData. So you can define whatever groupings you want

View full answer

rcorces · 2023-07-19T15:27:50Z

rcorces
Jul 19, 2023
Maintainer

Your clusters are extremely sample-specific so I dont think that making cluster-level pseudobulks is very helpful for you. I cant say what the right approach is for your analysis but if you're just looking for differential testing, then the pseudobulks arent actually used. in stead, you use a column (that you could create) in cellColData. So you can define whatever groupings you want

2 replies

brindhaprath Jul 24, 2023
Author

Thank you for your insight! The fact that the clusters are so sample-specific is what made me realize I'd have to deviate from the manual for my analysis. However, I'm not sure what you mean when you say the pseudobulks aren't used for differential testing. I'm interested in identifying marker peaks and doing pairwise testing between groups, as well as looking into TF motif & feature enrichment. All of these downstream analyses involve the use of the Peak Matrix, which in ArchR seems to be dependent upon pseudobulk replicates.

Would grouping by sample, instead of cluster, make sense with the method that ArchR uses to create pseudobulks? Do you have any suggestions for how I might create a peak matrix without pseudo bulking, in a manner that better fits the cell population?

rcorces Jul 24, 2023
Maintainer

All of these downstream analyses involve the use of the Peak Matrix, which in ArchR seems to be dependent upon pseudobulk replicates.

Yes but the actual differential testing doesnt use pseudobulk replicates for statistics (unfortunately)

I dont have further recommendations - I dont provide such recommendations on individual analyses

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Creating Pseudobulk Replicates for Differential Analysis #1992

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Creating Pseudobulk Replicates for Differential Analysis #1992

Uh oh!

brindhaprath Jul 18, 2023

Replies: 1 comment · 2 replies

Uh oh!

rcorces Jul 19, 2023 Maintainer

Uh oh!

brindhaprath Jul 24, 2023 Author

Uh oh!

rcorces Jul 24, 2023 Maintainer

brindhaprath
Jul 18, 2023

Replies: 1 comment 2 replies

rcorces
Jul 19, 2023
Maintainer

brindhaprath Jul 24, 2023
Author

rcorces Jul 24, 2023
Maintainer