Skip to content

Commit 1022484

Browse files
authored
[ENH] analytic computations for energy of some distributions (#691)
#### Reference Issues/PRs Towards #267 #### What does this implement/fix? Explain your changes. Analytic energy formulas for InverseGamma, InverseGaussian, LogGamma, TruncatedNormal, Poisson. All formulas validated against Monte Carlo estimates. Fixes broadcasting, DataFrame, and scalar return shape. See docstrings for formulas.
1 parent f1c6d57 commit 1022484

File tree

9 files changed

+864
-5
lines changed

9 files changed

+864
-5
lines changed

skpro/distributions/chi_squared.py

Lines changed: 48 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,9 @@
33

44
__author__ = ["sukjingitsit"]
55

6+
import numpy as np
67
import pandas as pd
8+
from scipy.integrate import quad
79
from scipy.stats.distributions import chi2
810

911
from skpro.distributions.base import BaseDistribution
@@ -31,7 +33,7 @@ class ChiSquared(BaseDistribution):
3133
"authors": "sukjingitsit",
3234
# estimator tags
3335
# --------------
34-
"capabilities:exact": ["mean", "var", "pdf", "log_pdf", "cdf", "ppf"],
36+
"capabilities:exact": ["mean", "var", "energy", "pdf", "log_pdf", "cdf", "ppf"],
3537
"distr:measuretype": "continuous",
3638
"distr:paramtype": "parametric",
3739
"broadcast_init": "on",
@@ -147,6 +149,51 @@ def _ppf(self, p):
147149
icdf_arr = chi2.ppf(p, dof)
148150
return icdf_arr
149151

152+
def _energy_self(self):
153+
r"""Energy of self, w.r.t. self.
154+
155+
Uses deterministic 1D quadrature:
156+
\mathbb{E}|X-Y| = 2 \int_0^\infty F(t)(1-F(t)) dt,
157+
where F is the ChiSquared CDF.
158+
"""
159+
dof = self._bc_params["dof"]
160+
161+
def self_energy_cell(k):
162+
integral, _ = quad(
163+
lambda t: chi2.cdf(t, k) * (1 - chi2.cdf(t, k)), 0, np.inf, limit=200
164+
)
165+
return 2 * integral
166+
167+
vec_energy = np.vectorize(self_energy_cell)
168+
energy_arr = vec_energy(dof)
169+
if np.ndim(energy_arr) > 1:
170+
energy_arr = energy_arr.sum(axis=1)
171+
return energy_arr
172+
173+
def _energy_x(self, x):
174+
r"""Energy of self, w.r.t. a constant frame x.
175+
176+
Closed form implementation based on the formula:
177+
For x <= 0: energy(x) = k + |x|
178+
For x > 0: energy(x) = x*(2*CDF(k,x)-1) + k - 2*k*CDF(k+1,x)
179+
where k = degrees of freedom.
180+
"""
181+
dof = self._bc_params["dof"]
182+
183+
def energy_cell(k, xi):
184+
if xi <= 0:
185+
return k + abs(xi)
186+
else:
187+
cdf_k = chi2.cdf(xi, k)
188+
cdf_k1 = chi2.cdf(xi, k + 1)
189+
return xi * (2 * cdf_k - 1) + k - 2 * k * cdf_k1
190+
191+
vec_energy = np.vectorize(energy_cell)
192+
energy_arr = vec_energy(dof, x)
193+
if np.ndim(energy_arr) > 1:
194+
energy_arr = energy_arr.sum(axis=1)
195+
return energy_arr
196+
150197
@classmethod
151198
def get_test_params(cls, parameter_set="default"):
152199
"""Return testing parameter settings for the estimator."""
Lines changed: 300 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,300 @@
1+
# Energy Distance Formulae for Probability Distributions
2+
3+
This file collects analytic formulae, derivations, and summary tables for energy distance calculations ($\mathbb{E}|X-Y|$) for probability distributions implemented in skpro.
4+
5+
---
6+
7+
## Summary Table (energy_report.tex)
8+
9+
| Distribution | Analytic | Monte Carlo | Abs Error |
10+
|---------------------|-----------|-------------|-----------|
11+
| Beta(2,3) | 0.228571 | 0.228128 | 0.000444 |
12+
| ChiSquared(2) | 2.000000 | 2.000000 | 0.000000 |
13+
| Exponential(2) | 0.500000 | 0.500166 | 0.000166 |
14+
| Gamma(2,3) | 0.500000 | 0.497054 | 0.002946 |
15+
| Logistic(0,1) | 2.000000 | 1.997396 | 0.002604 |
16+
| LogNormal(0,1) | 1.716318 | 1.716318 | 0.000000 |
17+
| Pareto(1,3) | 0.600000 | 0.599371 | 0.000629 |
18+
| T(0,1,5) | 1.383983 | 1.383983 | 0.000000 |
19+
| Weibull(1,2) | 0.519140 | 0.521159 | 0.002019 |
20+
21+
## Summary Table (energy_report2.tex)
22+
23+
| Distribution | Analytic | Monte Carlo | Abs Error |
24+
|-------------------------------|-----------|-------------|-----------|
25+
| InverseGamma(3,2) | 0.750000 | 0.749457 | 0.000543 |
26+
| InverseGaussian(2,1) | 2.272415 | 2.278161 | 0.005746 |
27+
| LogGamma(2) | 0.886294 | 0.692743 | 0.193551 |
28+
| Poisson(3) | 1.907392 | 1.910830 | 0.003438 |
29+
| TruncatedNormal(0,1,-1,2) | 0.824430 | 0.824881 | 0.000451 |
30+
31+
## Example: Beta(2,3)
32+
33+
**PDF:**
34+
$$
35+
f(x) = \frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha, \beta)}, \quad x \in (0,1)
36+
$$
37+
38+
**Energy:**
39+
$$
40+
\mathbb{E}|X-Y| = 2 \int_0^1 F(t)(1-F(t)) dt
41+
$$
42+
where $F$ is the Beta CDF.
43+
44+
**Derivation:**
45+
Let $F$ be the CDF of Beta$(\alpha,\beta)$. The energy is:
46+
$$
47+
\mathbb{E}|X-Y| = 2 \int_{-\infty}^{\infty} F(t)(1-F(t)) dt
48+
$$
49+
For Beta, the support is $(0,1)$, so:
50+
51+
## Example: Weibull(1,2)
52+
53+
**PDF:**
54+
$$
55+
f(x) = 2x e^{-x^2}, \quad x > 0
56+
$$
57+
58+
**Energy:**
59+
$$
60+
\mathbb{E}|X-Y| = 2 \int_0^\infty F(t)(1-F(t)) dt
61+
$$
62+
where $F$ is the Weibull CDF.
63+
64+
**Derivation:**
65+
Let $F$ be the CDF of Weibull$(\lambda, k)$ ($\lambda=1$, $k=2$):
66+
$$
67+
F(t) = 1 - e^{-t^2}
68+
$$
69+
Plug into the general formula.
70+
71+
## Example: Inverse Gaussian(2,1)
72+
73+
**PDF:**
74+
$$
75+
f(x) = \left(\frac{1}{2\pi x^3}\right)^{1/2} \exp\left(-\frac{(x-2)^2}{2 \cdot 2^2 x}\right), \quad x > 0
76+
$$
77+
78+
**Energy:**
79+
$$
80+
\mathbb{E}|X-Y| = 2 \int_0^\infty F(t)(1-F(t)) dt
81+
$$
82+
where $F$ is the Inverse Gaussian CDF.
83+
84+
**Derivation:**
85+
Plug the CDF of Inverse Gaussian into the general formula.
86+
87+
## Example: LogGamma(2)
88+
89+
**PDF:**
90+
$$
91+
f(x) = \frac{1}{\Gamma(2)} e^{2x - e^x}, \quad x \in \mathbb{R}
92+
$$
93+
94+
**Energy:**
95+
$$
96+
\mathbb{E}|X-Y| = 2 \int_{-\infty}^{\infty} F(t)(1-F(t)) dt
97+
$$
98+
where $F$ is the LogGamma CDF.
99+
100+
**Derivation:**
101+
Plug the CDF of LogGamma into the general formula.
102+
103+
## Example: Poisson(3)
104+
105+
**PMF:**
106+
$$
107+
P(X = k) = \frac{3^k e^{-3}}{k!}, \quad k = 0,1,2,...
108+
$$
109+
110+
**Energy:**
111+
$$
112+
\mathbb{E}|X-Y| = \sum_{k=0}^\infty \sum_{l=0}^\infty |k-l| P(X=k)P(Y=l)
113+
$$
114+
where $X, Y \sim \text{Poisson}(3)$ i.i.d.
115+
116+
**Derivation:**
117+
For discrete distributions, the energy is the expected absolute difference between two independent samples.
118+
119+
120+
## Example: Truncated Normal(0,1,-1,2)
121+
122+
**PDF:**
123+
$$
124+
f(x) = \frac{\phi(x)}{\Phi(2) - \Phi(-1)}, \quad x \in (-1,2)
125+
$$
126+
where $\phi$ is the standard normal PDF and $\Phi$ is the CDF.
127+
128+
**Energy:**
129+
$$
130+
\mathbb{E}|X-Y| = 2 \int_{-1}^{2} F(t)(1-F(t)) dt
131+
$$
132+
where $F$ is the CDF of the truncated normal.
133+
134+
**Derivation:**
135+
Plug the CDF of the truncated normal into the general formula, integrating over the truncated support.
136+
137+
## Example: Exponential(2)
138+
139+
**PDF:**
140+
$$
141+
f(x) = 2 e^{-2x}, \quad x > 0
142+
$$
143+
144+
**Energy:**
145+
$$
146+
\mathbb{E}|X-Y| = \frac{1}{\lambda}
147+
$$
148+
For Exponential$(\lambda)$, $\lambda=2$, so $\mathbb{E}|X-Y| = 0.5$.
149+
150+
**Derivation:**
151+
For $X, Y \sim \text{Exp}(\lambda)$ i.i.d.,
152+
$$
153+
\mathbb{E}|X-Y| = \frac{1}{\lambda}
154+
$$
155+
This follows from integrating the absolute difference of two independent exponentials.
156+
157+
## Example: Gamma(2,3)
158+
159+
**PDF:**
160+
$$
161+
f(x) = \frac{3^2}{\Gamma(2)} x^{2-1} e^{-3x} = 9x e^{-3x}, \quad x > 0
162+
$$
163+
164+
**Energy:**
165+
$$
166+
\mathbb{E}|X-Y| = 2 \int_0^\infty F(t)(1-F(t)) dt
167+
$$
168+
where $F$ is the Gamma CDF.
169+
170+
**Derivation:**
171+
Let $F$ be the CDF of Gamma$(k,\theta)$ (here $k=2$, $\theta=1/3$). The general formula applies:
172+
$$
173+
\mathbb{E}|X-Y| = 2 \int_0^\infty F(t)(1-F(t)) dt
174+
$$i
175+
176+
## Example: Logistic(0,1)
177+
178+
**PDF:**
179+
$$
180+
f(x) = \frac{e^{-x}}{(1+e^{-x})^2}, \quad x \in \mathbb{R}
181+
$$
182+
183+
**Energy:**
184+
$$
185+
\mathbb{E}|X-Y| = 2 \int_{-\infty}^{\infty} F(t)(1-F(t)) dt = 2
186+
$$
187+
where $F$ is the Logistic CDF.
188+
189+
**Derivation:**
190+
For standard Logistic, the integral evaluates to 1, so $2 \times 1 = 2$.
191+
192+
193+
## Example: Pareto(1,3)
194+
195+
**PDF:**
196+
$$
197+
f(x) = 3 x^{-4}, \quad x > 1
198+
$$
199+
200+
**Energy:**
201+
$$
202+
\mathbb{E}|X-Y| = 2 \int_1^\infty F(t)(1-F(t)) dt
203+
$$
204+
where $F$ is the Pareto CDF.
205+
206+
**Derivation:**
207+
Let $F$ be the CDF of Pareto$(x_m,\alpha)$ ($x_m=1$, $\alpha=3$):
208+
$$
209+
F(t) = 1 - t^{-3}, \quad t \geq 1
210+
$$
211+
Plug into the general formula.
212+
213+
## Example: Inverse Gamma(3,2)
214+
215+
**PDF:**
216+
$$
217+
f(x) = \frac{2^3 x^{-4} \exp\left(-\frac{2}{x}\right)}{\Gamma(3)}, \quad x > 0
218+
$$
219+
220+
**Energy:**
221+
$$
222+
\mathbb{E}|X-Y| = 2 \int_0^\infty F(t)(1-F(t)) dt
223+
$$
224+
where $F$ is the Inverse Gamma CDF.
225+
226+
**Derivation:**
227+
Let $F$ be the CDF of Inverse Gamma$(\alpha,\beta)$. The energy is:
228+
$$
229+
\mathbb{E}|X-Y| = 2 \int_{0}^{\infty} F(t)(1-F(t)) dt
230+
$$
231+
This follows from the general result for continuous distributions:
232+
$$
233+
\mathbb{E}|X-Y| = 2 \int_{-\infty}^{\infty} F(t)(1-F(t)) dt
234+
$$
235+
where the support is $(0,\infty)$ for Inverse Gamma.
236+
237+
## Example: LogNormal(0,1)
238+
239+
**PDF:**
240+
$$
241+
f(x) = \frac{1}{x \sqrt{2\pi}} \exp\left(-\frac{(\log x)^2}{2}\right), \quad x > 0
242+
$$
243+
244+
**Energy:**
245+
$$
246+
\mathbb{E}|X-Y| = 2 \int_0^\infty F(t)(1-F(t)) dt
247+
$$
248+
where $F$ is the LogNormal CDF with $\mu=0$, $\sigma=1$.
249+
250+
**Derivation:**
251+
For LogNormal$(\mu, \sigma)$, the energy is computed using numerical integration of the CDF:
252+
$$
253+
\mathbb{E}|X-Y| = 2 \int_{0}^{\infty} \Phi\left(\frac{\log t - \mu}{\sigma}\right) \left(1 - \Phi\left(\frac{\log t - \mu}{\sigma}\right)\right) dt
254+
$$
255+
256+
## Example: ChiSquared(2)
257+
258+
**PDF:**
259+
$$
260+
f(x) = \frac{1}{2} e^{-x/2}, \quad x > 0
261+
$$
262+
263+
**Energy:**
264+
$$
265+
\mathbb{E}|X-Y| = 2
266+
$$
267+
268+
**Derivation:**
269+
ChiSquared$(k)$ with $k=2$ is equivalent to Exponential$(1/2)$, and Exponential$(\lambda)$ has energy $1/\lambda = 2$.
270+
271+
For general ChiSquared$(k)$, the energy is computed using numerical integration.
272+
273+
## Example: T(0,1,5)
274+
275+
**PDF:**
276+
$$
277+
f(x) = \frac{\Gamma(3)}{\sqrt{5\pi} \Gamma(2.5)} \left(1 + \frac{x^2}{5}\right)^{-3}, \quad x \in \mathbb{R}
278+
$$
279+
280+
**Energy:**
281+
$$
282+
\mathbb{E}|X-Y| = 2 \int_{-\infty}^{\infty} F(t)(1-F(t)) dt
283+
$$
284+
where $F$ is the t-distribution CDF with 5 degrees of freedom.
285+
286+
**Derivation:**
287+
For Student's t-distribution with $\nu$ degrees of freedom, the energy is computed using numerical integration of the CDF.
288+
289+
---
290+
291+
## General Formula
292+
293+
For a continuous distribution with CDF $F$ and support $S$:
294+
$$
295+
\mathbb{E}|X-Y| = 2 \int_S F(t)(1-F(t)) dt
296+
$$
297+
298+
---
299+
300+
*This file is a temporary collection of analytic energy distance formulae and derivations for skpro distributions, until a more permanent documentation solution is implemented (see #689).*

0 commit comments

Comments
 (0)