Skip to content

Improve clumps annotation #57

@St3451

Description

@St3451

Currently, we are not completely satisfied with the assignment of clumps, even if the clumps definition is not used for any calculation and it is just stored as annotation for downstream analysis.
The following show two examples where the current algorithm likely fails to assign the correct labels (it defines more clumps than what we actually observe, i.e., 1 in the first plot and 3 or 4 in the second plot):

Image

Image

At the moment, the assignment of clumps is done by a network-based label propagation algorithm, where the nodes of the network are the detected clusters and the edges are the ratio of mutations shared between each cluster.
We will need to brainstorm and design a new approach that will be more conservative and will minimize the number of clumps assigned, likely by integrating the number of mutations of each cluster in the calculation (ideally, we don't want very small clusters to be independent entity if they are close to very big ones).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions