TSS assignment and p2g analysis #1313

rojinsafavi · 2022-03-01T19:43:17Z

rojinsafavi
Mar 1, 2022

Does assigning an alternative TSS for a gene influence the p2g correlation analysis?

For a specific gene(NRG1) ArchR assigns the longest transcript as the main one for NRG1, and uses that transcripts' TSS. but when I look at NRG1, I see that another isoform is more expressed (the one circled)

Even though the assigned TSS is different from what I visually see, the inferred expression of NRG1 is in the correct cluster (when I do the integration).

I was wondering if the TSS assignment could potentially influence the peak2gene analysis in this case?

Answered by rcorces

Mar 12, 2022

I'm sorry for not replying sooner. The genes of interest come from the geneAnnotation object. To keep analysis straightforward, only one start and stop is used for each gene. Not every TSS is used. So the positions being used for addPeakToGeneLinks are the positions of the genes shown by getGenes(ArchRProj). If those dont fit your analysis, you would have to edit the gene annotation information and re-run the analysis (starting from GeneScoreMatrix creation).

View full answer

rojinsafavi · 2022-03-07T17:32:22Z

rojinsafavi
Mar 7, 2022
Author

@rcorces I would very much appreciate it if I could get you advice into this matter, thank you so much again

0 replies

rojinsafavi · 2022-03-11T16:15:42Z

rojinsafavi
Mar 11, 2022
Author

I think since addReproduciblePeakSet uses promoter region (2000,100), if a TSS isn't assigned to the isoform of interest (the one with the highest accessibility at its TSS), this may influence the peak calling, and as a result that might influence the p2g analysis?

0 replies

rcorces · 2022-03-12T04:12:30Z

rcorces
Mar 12, 2022
Maintainer

I'm sorry for not replying sooner. The genes of interest come from the geneAnnotation object. To keep analysis straightforward, only one start and stop is used for each gene. Not every TSS is used. So the positions being used for addPeakToGeneLinks are the positions of the genes shown by getGenes(ArchRProj). If those dont fit your analysis, you would have to edit the gene annotation information and re-run the analysis (starting from GeneScoreMatrix creation).

0 replies

rojinsafavi · 2022-03-12T17:10:23Z

rojinsafavi
Mar 12, 2022
Author

Thanks @rcorces this might be a bug, I tried fixing NRG1 manually, I changed NRG1 start and end based on the isoform that I was interested in, and when I looked at the new grange it looks fine, I gave the new grange to addGeneScoreMatrix, and checked the genes again using getGenes, and I noticed that it hasn't been fixed(NRG1 start and end are not what I defined). Please let me know if this is a bug, I can post it in the issue section if it is, thanks again!

getgenes = getGenes(projHeme2)
df = as.data.frame(getgenes)
df[df$symbol %in% c('NRG1'),]$start = 32548635
df[df$symbol %in% c('NRG1'),]$end = 32764407
getgenes0 = makeGRangesFromDataFrame(df,keep.extra.columns=T)
df = as.data.frame(getgenes0)
df[df$symbol %in% c('NRG1'),]

projHeme2 = addGeneScoreMatrix(input = projHeme2,genes = getgenes0, force = T)

getgenes = getGenes(projHeme2)
df = as.data.frame(getgenes)
df[df$symbol %in% c('NRG1'),]

2 replies

rojinsafavi Mar 12, 2022
Author

I worked around the above issue by giving the new grange to the ArchR obj as follow:

projHeme2@geneAnnotation$genes = getgenes0

Thanks again

rcorces Mar 12, 2022
Maintainer

That is expected.

getGenes() looks at ArchRProj@geneAnnotation$genes which you actually havent changed.
When you run addPeak2GeneLinks() it uses the features (genes) that are returned by the following:

  ArrowFiles <- getArrowFiles(ArchRProj)
  geneSet <- .getFeatureDF(ArrowFiles, useMatrix, threads = threads)
  geneStart <- GRanges(geneSet$seqnames, IRanges(geneSet$start, width = 1), name = geneSet$name, idx = geneSet$idx)

ArchR/R/IntegrativeAnalysis.R

Line 1061 in 968e442

geneSet <- .getFeatureDF(ArrowFiles, useMatrix, threads = threads)

where geneStart is the position used for p2g analysis.

In this case, it is taking the features from GeneIntegrationMatrix which are in turn derived from the features in GeneScoreMatrix

ArchR/R/RNAIntegration.R

Line 248 in 968e442

geneDF <- .getFeatureDF(getArrowFiles(ArchRProj), useMatrix)

So by changing the genes used in GeneScoreMatrix like you've done, you should change the features used for p2g analysis.

What I'm not sure about is if there will be any issues because your gene features dont match the genes in your geneAnnotation but I think it should be ok. The other solution would be to change the BSGenome object used for the geneAnnotation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TSS assignment and p2g analysis #1313

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

TSS assignment and p2g analysis #1313

Uh oh!

Uh oh!

rojinsafavi Mar 1, 2022

Replies: 4 comments · 2 replies

Uh oh!

rojinsafavi Mar 7, 2022 Author

Uh oh!

rojinsafavi Mar 11, 2022 Author

Uh oh!

rcorces Mar 12, 2022 Maintainer

Uh oh!

Uh oh!

rojinsafavi Mar 12, 2022 Author

Uh oh!

rojinsafavi Mar 12, 2022 Author

Uh oh!

rcorces Mar 12, 2022 Maintainer

rojinsafavi
Mar 1, 2022

Replies: 4 comments 2 replies

rojinsafavi
Mar 7, 2022
Author

rojinsafavi
Mar 11, 2022
Author

rcorces
Mar 12, 2022
Maintainer

rojinsafavi
Mar 12, 2022
Author

rojinsafavi Mar 12, 2022
Author

rcorces Mar 12, 2022
Maintainer