Using Peak-Gene Assignments to Calculate Gene Score #576

emmawwinchester · 2021-02-23T19:52:51Z

emmawwinchester
Feb 23, 2021

A problem we've been running into with our data is that the default distance from the tss used to calculate gene score genome-wide is not reliable in non-terminally differentiated cells.
The problem we are seeing is that the default works fine for some genes, but this average distance is not great in non-terminally differentiated cells, such as embryonic stem cells. Even changing around the parameters to different distances flanking the TSS isn't reliable due to the fact that every gene lies in a different genomic context, surrounded by other genes at varying distances that may or may not also be accessible and also have active enhancers nearby.

We've fiddled with the various settings, and we think the best way to overcome these problems would be to first assign peaks to genes (either using archr, or using ABCenhancergene or a similar program), then use these assignments/loops as a factor in the prediction of the gene score, instead of relying on the metric of accessibility within a certain distance of the gene. This would take into account the accessibility at the tss of the gene in addition to known biological connections between regulatory elements and the tss.

The idea would be to use addGeneScoreMatrix, with the option geneModel=usePeakToGene(project), or something along those lines.

Is this something that would be possible to add? We've looked into adding our own patches to rig it up for our own uses, but haven't been able to thus far.
Thank you all in advance.

Answered by jgranja24

Feb 24, 2021

Hi @emmawwinchester, the main utility of Gene Scores is to identify biological labels associated with clusters based on known marker genes. This method isnt perfect, but it does work surprisingly well for lots of marker genes. In regards to your question, this really isnt possible to use in this manner at the moment because of how the implementation is set up. The best thing i can imagine is splitting the peaks into groups (based on the linked gene assignment) and computing module scores (see #308). --Screenshot from that issue

View full answer

jgranja24 · 2021-02-24T04:18:28Z

jgranja24
Feb 24, 2021
Maintainer

Hi @emmawwinchester, the main utility of Gene Scores is to identify biological labels associated with clusters based on known marker genes. This method isnt perfect, but it does work surprisingly well for lots of marker genes. In regards to your question, this really isnt possible to use in this manner at the moment because of how the implementation is set up. The best thing i can imagine is splitting the peaks into groups (based on the linked gene assignment) and computing module scores (see #308). --Screenshot from that issue

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using Peak-Gene Assignments to Calculate Gene Score #576

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Using Peak-Gene Assignments to Calculate Gene Score #576

Uh oh!

emmawwinchester Feb 23, 2021

Replies: 1 comment

Uh oh!

Uh oh!

jgranja24 Feb 24, 2021 Maintainer

emmawwinchester
Feb 23, 2021

jgranja24
Feb 24, 2021
Maintainer