-
Notifications
You must be signed in to change notification settings - Fork 12
Description
The current representation of a peak includes the mz, rt the sd1 and sd2 of the bi-gaussian peak shape and the peak area. The peak is overall modeled using a bi-gaussian, which means we can construct a probability distribution from those parameters. The Kullback Leibler divergence is used to calculate the similarity between probability distributions. Implementing this metric to compare bi-gaussian peaks would allow us to calculate peak similarity. If we normalize the height of the peak (so not compare using the overall abundance), we could nicely compare the peak shapes of two peaks.
The task is to implement this calculation in two ways, including the area (meaning also including the peak intensity in the comparison) and using a normalized area (ignoring intensity and focusing only on peak shape). Note that ignoring the intensity doesn't mean changing all peaks to a unit height or so, as this would change the actual standard deviation values of the peak itself.
There are also some other divergence functions which could be used to measure the dissimilarity (see here for an overall summary:
Kullback-Leiber Divergence
Jensen-Shannon Divergence
Wasserstein Distance
Note that the bi-gaussian falls into the category of split-normal distributions and the metrics may not be defined. This article gives a good introduction to the topic: https://projecteuclid.org/journalArticle/Download?urlId=10.1214%2F13-STS417
A good starting point would be the implementation of various probability density functions as well as split normal distributions to get familiar with how they work and then start implementing the distance measures - AI chat bots can be very useful in this but should also be used with caution, since it is very difficult to verify these claims here.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status