Implement method to calculate the Kullback-Leibler Divergence between two features

The current representation of a peak includes the ```mz```, ```rt``` the ```sd1``` and ```sd2``` of the bi-gaussian peak shape and the peak ```area```. The peak is overall modeled using a bi-gaussian, which means we can construct a probability distribution from those parameters. The Kullback Leibler divergence is used to calculate the similarity between probability distributions. Implementing this metric to compare bi-gaussian peaks would allow us to calculate peak similarity. If we normalize the height of the peak (so not compare using the overall abundance), we could nicely compare the peak shapes of two peaks.

The task is to implement this calculation in two ways, including the area (meaning also including the peak intensity in the comparison) and using a normalized area (ignoring intensity and focusing only on peak shape). Note that ignoring the intensity doesn't mean changing all peaks to a unit height or so, as this would change the actual standard deviation values of the peak itself.

There are also some other divergence functions which could be used to measure the dissimilarity (see [here](https://en.wikipedia.org/wiki/Statistical_distance) for an overall summary:
[Kullback-Leiber Divergence](https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence)
[Jensen-Shannon Divergence](https://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence)
[Wasserstein Distance](https://en.wikipedia.org/wiki/Wasserstein_metric)

Note that the bi-gaussian falls into the category of [split-normal distributions](https://en.wikipedia.org/wiki/Split_normal_distribution) and the metrics may not be defined. This article gives a good introduction to the topic: https://projecteuclid.org/journalArticle/Download?urlId=10.1214%2F13-STS417

A good starting point would be the implementation of various probability density functions as well as split normal distributions to get familiar with how they work and then start implementing the distance measures - AI chat bots can be very useful in this but should also be used with caution, since it is very difficult to verify these claims here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement method to calculate the Kullback-Leibler Divergence between two features #240

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement method to calculate the Kullback-Leibler Divergence between two features #240

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions