Allowing Replicates to Annotated by Variable Other than Sample in addGroupCoverages #584

imk1 · 2021-01-26T05:04:32Z

imk1
Jan 26, 2021

Describe the problem that your feature request would address.
I am working with a dataset that has multiple technical replicates for every biological replicate. As a result, a collaborator gave me a separate arrow file for each technical replicate and a metadata file indicating the technical and biological replicates for each cell. I would like to identify IDR reproducible peaks across the 2 biological replicates and combine all of the technical replicates for each cluster.

Describe the solution you'd like
Adding an option to addGroupCoverages that allows me to select a column other Sample to annotate biological replicates would allow me to do this. Sample can be the default.

Describe alternatives you've considered
I could ask the collaborator to combine the bam files and re-create the arrow files, but this would require substantial additional storage space and compute time. I would not be surprised if other researchers also have a bam file from each of multiple technical replicates from each biological replicates and would prefer not to have to combine the files for each biological replicate before using ArchR.

badoi · 2021-01-26T05:24:07Z

badoi
Jan 26, 2021

Hi Irene,

Here's a test of what would probably solve your problem. I think the errors were put in place for some good measures to prevent accidental unwanted problems, so this is a work-around that should be used w/ caution.

# just in case we try something stupid
proj$Sample_old = proj$Sample

# throws error, probably good to halt any insanity
proj$Sample = paste(proj$Sample, 'silly_things', sep = '_')

# ignore error, force overwrite, oh my that worked
proj@cellColData$Sample = paste(proj$Sample, 'silly_things', sep = '_') 

# control-z, undo undo!
proj@cellColData$Sample = proj$Sample_old 
proj@cellColData$Sample_old = NULL

0 replies

rcorces · 2021-01-26T05:30:41Z

rcorces
Jan 26, 2021
Maintainer

@imk1 - thanks for this suggestion. We havent run into this in our typical workflow but I see the utility. I dont think it will be hard to add but it might still take time. In the meantime, the suggestion from @badoi seems like a good stop gap.

0 replies

imk1 · 2021-01-26T05:32:18Z

imk1
Jan 26, 2021
Author

@rcorces Thanks!
The suggestion from @badoi seems to work. To clarify for other users, you can do this:
proj@cellColData$Sample = proj$[column indicating biological replicate]

However, this fix seems to lead to the following error when running addGroupCoverages:
Error in h5checktypeOrOpenLoc(file, readonly = TRUE, native = native) :
Error in h5checktypeOrOpenLoc(). Cannot open file. File 'NA' does not exist.

0 replies

jgranja24 · 2021-02-12T00:36:04Z

jgranja24
Feb 12, 2021
Maintainer

Hi @imk1, sorry for the delay. I am still working on trying to implement more stability with this. I still dont exactly follow why you wouldnt just want to treat each sample different, but I am hoping to have a fix soon.

0 replies

imk1 · 2021-02-12T00:50:24Z

imk1
Feb 12, 2021
Author

No worries about the delay, as one can always merge bam files of technical replicates (that just takes up a lot of space).

To give an example, lets say you have data from 2 mouse livers -- mouse liver 1 and mouse liver 2. Each mouse liver was split into 2 pieces. As a result, you have 4 samples -- mouse liver 1 piece A (abbreviate as 1A), mouse liver 1 piece B (abbreviate as 1B), mouse liver 2 piece A (abbreviate as 2A), and mouse liver 2 pieces B (abbreviate as 2B). You sequenced each of these separately and mapped all 4 samples in parallel to speed up read mapping, so you now have 4 bam files. However, these 4 bam files represent 2 mice. If I were to make an arrow file out of each of these and run ArchR, these would be treated as 4 biological replicates even though the represent 2 mice. To prevent this from happening, I currently would merge 1A and 1B into a large bam file, merge 2A and 2B into a large bam file, make arrows out of those, and then run ArchR on those arrows. However, this leads me to have bam files taking up twice as much space as my original bam files, and some labs have limited storage space.

Implementing this feature is not that big of a deal, as not having it requires only 1 additional pre-processing step, but it occurred to me that others might also find it useful.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allowing Replicates to Annotated by Variable Other than Sample in addGroupCoverages #584

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Allowing Replicates to Annotated by Variable Other than Sample in addGroupCoverages #584

Uh oh!

Uh oh!

imk1 Jan 26, 2021

Replies: 5 comments

Uh oh!

badoi Jan 26, 2021

Uh oh!

rcorces Jan 26, 2021 Maintainer

Uh oh!

Uh oh!

imk1 Jan 26, 2021 Author

Uh oh!

jgranja24 Feb 12, 2021 Maintainer

Uh oh!

imk1 Feb 12, 2021 Author

imk1
Jan 26, 2021

badoi
Jan 26, 2021

rcorces
Jan 26, 2021
Maintainer

imk1
Jan 26, 2021
Author

jgranja24
Feb 12, 2021
Maintainer

imk1
Feb 12, 2021
Author