-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Currently, we are not completely satisfied with the assignment of clumps, even if the clumps definition is not used for any calculation and it is just stored as annotation for downstream analysis.
The following show two examples where the current algorithm likely fails to assign the correct labels (it defines more clumps than what we actually observe, i.e., 1 in the first plot and 3 or 4 in the second plot):
At the moment, the assignment of clumps is done by a network-based label propagation algorithm, where the nodes of the network are the detected clusters and the edges are the ratio of mutations shared between each cluster.
We will need to brainstorm and design a new approach that will be more conservative and will minimize the number of clumps assigned, likely by integrating the number of mutations of each cluster in the calculation (ideally, we don't want very small clusters to be independent entity if they are close to very big ones).

