|
| 1 | +# Energy Distance Formulae for Probability Distributions |
| 2 | + |
| 3 | +This file collects analytic formulae, derivations, and summary tables for energy distance calculations ($\mathbb{E}|X-Y|$) for probability distributions implemented in skpro. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## Summary Table (energy_report.tex) |
| 8 | + |
| 9 | +| Distribution | Analytic | Monte Carlo | Abs Error | |
| 10 | +|---------------------|-----------|-------------|-----------| |
| 11 | +| Beta(2,3) | 0.228571 | 0.228128 | 0.000444 | |
| 12 | +| ChiSquared(2) | 2.000000 | 2.000000 | 0.000000 | |
| 13 | +| Exponential(2) | 0.500000 | 0.500166 | 0.000166 | |
| 14 | +| Gamma(2,3) | 0.500000 | 0.497054 | 0.002946 | |
| 15 | +| Logistic(0,1) | 2.000000 | 1.997396 | 0.002604 | |
| 16 | +| LogNormal(0,1) | 1.716318 | 1.716318 | 0.000000 | |
| 17 | +| Pareto(1,3) | 0.600000 | 0.599371 | 0.000629 | |
| 18 | +| T(0,1,5) | 1.383983 | 1.383983 | 0.000000 | |
| 19 | +| Weibull(1,2) | 0.519140 | 0.521159 | 0.002019 | |
| 20 | + |
| 21 | +## Summary Table (energy_report2.tex) |
| 22 | + |
| 23 | +| Distribution | Analytic | Monte Carlo | Abs Error | |
| 24 | +|-------------------------------|-----------|-------------|-----------| |
| 25 | +| InverseGamma(3,2) | 0.750000 | 0.749457 | 0.000543 | |
| 26 | +| InverseGaussian(2,1) | 2.272415 | 2.278161 | 0.005746 | |
| 27 | +| LogGamma(2) | 0.886294 | 0.692743 | 0.193551 | |
| 28 | +| Poisson(3) | 1.907392 | 1.910830 | 0.003438 | |
| 29 | +| TruncatedNormal(0,1,-1,2) | 0.824430 | 0.824881 | 0.000451 | |
| 30 | + |
| 31 | +## Example: Beta(2,3) |
| 32 | + |
| 33 | +**PDF:** |
| 34 | +$$ |
| 35 | +f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}, \quad x \in (0,1) |
| 36 | +$$ |
| 37 | + |
| 38 | +**Energy:** |
| 39 | +$$ |
| 40 | +\mathbb{E}|X-Y| = 2 \int_0^1 F(t)(1-F(t)) dt |
| 41 | +$$ |
| 42 | +where $F$ is the Beta CDF. |
| 43 | + |
| 44 | +**Derivation:** |
| 45 | +Let $F$ be the CDF of Beta$(\alpha,\beta)$. The energy is: |
| 46 | +$$ |
| 47 | +\mathbb{E}|X-Y| = 2 \int_{-\infty}^{\infty} F(t)(1-F(t)) dt |
| 48 | +$$ |
| 49 | +For Beta, the support is $(0,1)$, so: |
| 50 | + |
| 51 | +## Example: Weibull(1,2) |
| 52 | + |
| 53 | +**PDF:** |
| 54 | +$$ |
| 55 | +f(x) = 2x e^{-x^2}, \quad x > 0 |
| 56 | +$$ |
| 57 | + |
| 58 | +**Energy:** |
| 59 | +$$ |
| 60 | +\mathbb{E}|X-Y| = 2 \int_0^\infty F(t)(1-F(t)) dt |
| 61 | +$$ |
| 62 | +where $F$ is the Weibull CDF. |
| 63 | + |
| 64 | +**Derivation:** |
| 65 | +Let $F$ be the CDF of Weibull$(\lambda, k)$ ($\lambda=1$, $k=2$): |
| 66 | +$$ |
| 67 | +F(t) = 1 - e^{-t^2} |
| 68 | +$$ |
| 69 | +Plug into the general formula. |
| 70 | + |
| 71 | +## Example: Inverse Gaussian(2,1) |
| 72 | + |
| 73 | +**PDF:** |
| 74 | +$$ |
| 75 | +f(x) = \left(\frac{1}{2\pi x^3}\right)^{1/2} \exp\left(-\frac{(x-2)^2}{2 \cdot 2^2 x}\right), \quad x > 0 |
| 76 | +$$ |
| 77 | + |
| 78 | +**Energy:** |
| 79 | +$$ |
| 80 | +\mathbb{E}|X-Y| = 2 \int_0^\infty F(t)(1-F(t)) dt |
| 81 | +$$ |
| 82 | +where $F$ is the Inverse Gaussian CDF. |
| 83 | + |
| 84 | +**Derivation:** |
| 85 | +Plug the CDF of Inverse Gaussian into the general formula. |
| 86 | + |
| 87 | +## Example: LogGamma(2) |
| 88 | + |
| 89 | +**PDF:** |
| 90 | +$$ |
| 91 | +f(x) = \frac{1}{\Gamma(2)} e^{2x - e^x}, \quad x \in \mathbb{R} |
| 92 | +$$ |
| 93 | + |
| 94 | +**Energy:** |
| 95 | +$$ |
| 96 | +\mathbb{E}|X-Y| = 2 \int_{-\infty}^{\infty} F(t)(1-F(t)) dt |
| 97 | +$$ |
| 98 | +where $F$ is the LogGamma CDF. |
| 99 | + |
| 100 | +**Derivation:** |
| 101 | +Plug the CDF of LogGamma into the general formula. |
| 102 | + |
| 103 | +## Example: Poisson(3) |
| 104 | + |
| 105 | +**PMF:** |
| 106 | +$$ |
| 107 | +P(X = k) = \frac{3^k e^{-3}}{k!}, \quad k = 0,1,2,... |
| 108 | +$$ |
| 109 | + |
| 110 | +**Energy:** |
| 111 | +$$ |
| 112 | +\mathbb{E}|X-Y| = \sum_{k=0}^\infty \sum_{l=0}^\infty |k-l| P(X=k)P(Y=l) |
| 113 | +$$ |
| 114 | +where $X, Y \sim \text{Poisson}(3)$ i.i.d. |
| 115 | + |
| 116 | +**Derivation:** |
| 117 | +For discrete distributions, the energy is the expected absolute difference between two independent samples. |
| 118 | + |
| 119 | + |
| 120 | +## Example: Truncated Normal(0,1,-1,2) |
| 121 | + |
| 122 | +**PDF:** |
| 123 | +$$ |
| 124 | +f(x) = \frac{\phi(x)}{\Phi(2) - \Phi(-1)}, \quad x \in (-1,2) |
| 125 | +$$ |
| 126 | +where $\phi$ is the standard normal PDF and $\Phi$ is the CDF. |
| 127 | + |
| 128 | +**Energy:** |
| 129 | +$$ |
| 130 | +\mathbb{E}|X-Y| = 2 \int_{-1}^{2} F(t)(1-F(t)) dt |
| 131 | +$$ |
| 132 | +where $F$ is the CDF of the truncated normal. |
| 133 | + |
| 134 | +**Derivation:** |
| 135 | +Plug the CDF of the truncated normal into the general formula, integrating over the truncated support. |
| 136 | + |
| 137 | +## Example: Exponential(2) |
| 138 | + |
| 139 | +**PDF:** |
| 140 | +$$ |
| 141 | +f(x) = 2 e^{-2x}, \quad x > 0 |
| 142 | +$$ |
| 143 | + |
| 144 | +**Energy:** |
| 145 | +$$ |
| 146 | +\mathbb{E}|X-Y| = \frac{1}{\lambda} |
| 147 | +$$ |
| 148 | +For Exponential$(\lambda)$, $\lambda=2$, so $\mathbb{E}|X-Y| = 0.5$. |
| 149 | + |
| 150 | +**Derivation:** |
| 151 | +For $X, Y \sim \text{Exp}(\lambda)$ i.i.d., |
| 152 | +$$ |
| 153 | +\mathbb{E}|X-Y| = \frac{1}{\lambda} |
| 154 | +$$ |
| 155 | +This follows from integrating the absolute difference of two independent exponentials. |
| 156 | + |
| 157 | +## Example: Gamma(2,3) |
| 158 | + |
| 159 | +**PDF:** |
| 160 | +$$ |
| 161 | +f(x) = \frac{3^2}{\Gamma(2)} x^{2-1} e^{-3x} = 9x e^{-3x}, \quad x > 0 |
| 162 | +$$ |
| 163 | + |
| 164 | +**Energy:** |
| 165 | +$$ |
| 166 | +\mathbb{E}|X-Y| = 2 \int_0^\infty F(t)(1-F(t)) dt |
| 167 | +$$ |
| 168 | +where $F$ is the Gamma CDF. |
| 169 | + |
| 170 | +**Derivation:** |
| 171 | +Let $F$ be the CDF of Gamma$(k,\theta)$ (here $k=2$, $\theta=1/3$). The general formula applies: |
| 172 | +$$ |
| 173 | +\mathbb{E}|X-Y| = 2 \int_0^\infty F(t)(1-F(t)) dt |
| 174 | +$$i |
| 175 | +
|
| 176 | +## Example: Logistic(0,1) |
| 177 | +
|
| 178 | +**PDF:** |
| 179 | +$$ |
| 180 | +f(x) = \frac{e^{-x}}{(1+e^{-x})^2}, \quad x \in \mathbb{R} |
| 181 | +$$ |
| 182 | +
|
| 183 | +**Energy:** |
| 184 | +$$ |
| 185 | +\mathbb{E}|X-Y| = 2 \int_{-\infty}^{\infty} F(t)(1-F(t)) dt = 2 |
| 186 | +$$ |
| 187 | +where $F$ is the Logistic CDF. |
| 188 | +
|
| 189 | +**Derivation:** |
| 190 | +For standard Logistic, the integral evaluates to 1, so $2 \times 1 = 2$. |
| 191 | +
|
| 192 | +
|
| 193 | +## Example: Pareto(1,3) |
| 194 | +
|
| 195 | +**PDF:** |
| 196 | +$$ |
| 197 | +f(x) = 3 x^{-4}, \quad x > 1 |
| 198 | +$$ |
| 199 | +
|
| 200 | +**Energy:** |
| 201 | +$$ |
| 202 | +\mathbb{E}|X-Y| = 2 \int_1^\infty F(t)(1-F(t)) dt |
| 203 | +$$ |
| 204 | +where $F$ is the Pareto CDF. |
| 205 | +
|
| 206 | +**Derivation:** |
| 207 | +Let $F$ be the CDF of Pareto$(x_m,\alpha)$ ($x_m=1$, $\alpha=3$): |
| 208 | +$$ |
| 209 | +F(t) = 1 - t^{-3}, \quad t \geq 1 |
| 210 | +$$ |
| 211 | +Plug into the general formula. |
| 212 | +
|
| 213 | +## Example: Inverse Gamma(3,2) |
| 214 | +
|
| 215 | +**PDF:** |
| 216 | +$$ |
| 217 | +f(x) = \frac{2^3 x^{-4} \exp\left(-\frac{2}{x}\right)}{\Gamma(3)}, \quad x > 0 |
| 218 | +$$ |
| 219 | +
|
| 220 | +**Energy:** |
| 221 | +$$ |
| 222 | +\mathbb{E}|X-Y| = 2 \int_0^\infty F(t)(1-F(t)) dt |
| 223 | +$$ |
| 224 | +where $F$ is the Inverse Gamma CDF. |
| 225 | +
|
| 226 | +**Derivation:** |
| 227 | +Let $F$ be the CDF of Inverse Gamma$(\alpha,\beta)$. The energy is: |
| 228 | +$$ |
| 229 | +\mathbb{E}|X-Y| = 2 \int_{0}^{\infty} F(t)(1-F(t)) dt |
| 230 | +$$ |
| 231 | +This follows from the general result for continuous distributions: |
| 232 | +$$ |
| 233 | +\mathbb{E}|X-Y| = 2 \int_{-\infty}^{\infty} F(t)(1-F(t)) dt |
| 234 | +$$ |
| 235 | +where the support is $(0,\infty)$ for Inverse Gamma. |
| 236 | +
|
| 237 | +## Example: LogNormal(0,1) |
| 238 | +
|
| 239 | +**PDF:** |
| 240 | +$$ |
| 241 | +f(x) = \frac{1}{x \sqrt{2\pi}} \exp\left(-\frac{(\log x)^2}{2}\right), \quad x > 0 |
| 242 | +$$ |
| 243 | +
|
| 244 | +**Energy:** |
| 245 | +$$ |
| 246 | +\mathbb{E}|X-Y| = 2 \int_0^\infty F(t)(1-F(t)) dt |
| 247 | +$$ |
| 248 | +where $F$ is the LogNormal CDF with $\mu=0$, $\sigma=1$. |
| 249 | +
|
| 250 | +**Derivation:** |
| 251 | +For LogNormal$(\mu, \sigma)$, the energy is computed using numerical integration of the CDF: |
| 252 | +$$ |
| 253 | +\mathbb{E}|X-Y| = 2 \int_{0}^{\infty} \Phi\left(\frac{\log t - \mu}{\sigma}\right) \left(1 - \Phi\left(\frac{\log t - \mu}{\sigma}\right)\right) dt |
| 254 | +$$ |
| 255 | +
|
| 256 | +## Example: ChiSquared(2) |
| 257 | +
|
| 258 | +**PDF:** |
| 259 | +$$ |
| 260 | +f(x) = \frac{1}{2} e^{-x/2}, \quad x > 0 |
| 261 | +$$ |
| 262 | +
|
| 263 | +**Energy:** |
| 264 | +$$ |
| 265 | +\mathbb{E}|X-Y| = 2 |
| 266 | +$$ |
| 267 | +
|
| 268 | +**Derivation:** |
| 269 | +ChiSquared$(k)$ with $k=2$ is equivalent to Exponential$(1/2)$, and Exponential$(\lambda)$ has energy $1/\lambda = 2$. |
| 270 | +
|
| 271 | +For general ChiSquared$(k)$, the energy is computed using numerical integration. |
| 272 | +
|
| 273 | +## Example: T(0,1,5) |
| 274 | +
|
| 275 | +**PDF:** |
| 276 | +$$ |
| 277 | +f(x) = \frac{\Gamma(3)}{\sqrt{5\pi} \Gamma(2.5)} \left(1 + \frac{x^2}{5}\right)^{-3}, \quad x \in \mathbb{R} |
| 278 | +$$ |
| 279 | +
|
| 280 | +**Energy:** |
| 281 | +$$ |
| 282 | +\mathbb{E}|X-Y| = 2 \int_{-\infty}^{\infty} F(t)(1-F(t)) dt |
| 283 | +$$ |
| 284 | +where $F$ is the t-distribution CDF with 5 degrees of freedom. |
| 285 | +
|
| 286 | +**Derivation:** |
| 287 | +For Student's t-distribution with $\nu$ degrees of freedom, the energy is computed using numerical integration of the CDF. |
| 288 | +
|
| 289 | +--- |
| 290 | +
|
| 291 | +## General Formula |
| 292 | +
|
| 293 | +For a continuous distribution with CDF $F$ and support $S$: |
| 294 | +$$ |
| 295 | +\mathbb{E}|X-Y| = 2 \int_S F(t)(1-F(t)) dt |
| 296 | +$$ |
| 297 | +
|
| 298 | +--- |
| 299 | +
|
| 300 | +*This file is a temporary collection of analytic energy distance formulae and derivations for skpro distributions, until a more permanent documentation solution is implemented (see #689).* |
0 commit comments