Skip to content

Commit f441b44

Browse files
authored
Differential entropy estimators: various fixes (#211)
* Update docs and conform to `Entropy` API * Update tests * Update changelog * Conform to `Entropy` api. Fix scaling for order statistic estimators. * Be explicit about default base 2 logarithm. * Test default `Shannon(; base = 2)` * Remove `base` from docstring * Just compare all differential entropy estimators at once * Just compare all differential entropy estimators at once * Don't mention estimator that is not merged yet
1 parent c77d9ae commit f441b44

File tree

20 files changed

+450
-292
lines changed

20 files changed

+450
-292
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ The API for Entropies.jl has been completely overhauled. Major changes are:
99
- Common generic interfaces `entropy`, `entropy_normalized` and `maximum` (maximum entropy) that dispatches on different types of entropies (e.g `Renyi()` `Shannon()`, `Tsallis()`).
1010
- Convenience functions for common entropies, such as permutation entropy and dispersion entropy.
1111
- No more deprecation warnings for using the old keyword `α` for Renyi entropy.
12+
- The `base` of the entropy is now a field of the `Entropy` type, not the estimator.
13+
You'll now have to do `entropy(Shannon(; base = 2), est, x)`.
1214
- An entirely new section of entropy-like complexity measures, such as the reverse dispersion entropy.
1315
- Many new estimators, such as `SpatialPermutation` and `PowerSpectrum`.
1416
- Check the online documentation for a comprehensive overview of the changes.

docs/src/entropies.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -53,9 +53,7 @@ rely on estimating some density functional.
5353

5454
Each [`EntropyEstimator`](@ref)s uses a specialized technique to approximating relevant
5555
densities/integrals, and is often tailored to one or a few types of generalized entropy.
56-
For example, [`Kraskov`](@ref) estimates the [`Shannon`](@ref) entropy, while
57-
[`LeonenkoProzantoSavani`](@ref) estimates [`Shannon`](@ref), [`Renyi`](@ref), and
58-
[`Tsallis`](@ref) entropies.
56+
For example, [`Kraskov`](@ref) estimates the [`Shannon`](@ref) entropy.
5957

6058
| Estimator | Principle | Input data | [`Shannon`](@ref) | [`Renyi`](@ref) | [`Tsallis`](@ref) | [`Kaniadakis`](@ref) | [`Curado`](@ref) | [`StretchedExponential`](@ref) |
6159
| ---------------------------- | ----------------- | ---------- | :---------------: | :-------------: | :---------------: | :------------------: | :--------------: | :----------------------------: |

docs/src/examples.md

Lines changed: 62 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -25,119 +25,105 @@ ax.zticklabelsvisible = false
2525
fig
2626
```
2727

28-
## Differential entropy: nearest neighbors estimators
28+
## Differential entropy: estimator comparison
2929

30-
Here, we reproduce Figure 1 in Charzyńska & Gambin (2016)[^Charzyńska2016]. Their example
31-
demonstrates how the [`Kraskov`](@ref) and [`KozachenkoLeonenko`](@ref) nearest neighbor
32-
based estimators converge towards the true entropy value for increasing time series length.
33-
We extend their example with [`Zhu`](@ref) and [`ZhuSingh`](@ref) estimators, which are also
34-
based on nearest neighbor searches.
30+
Here, we compare how the nearest neighbor differential entropy estimators
31+
([`Kraskov`](@ref), [`KozachenkoLeonenko`](@ref), [`Zhu`](@ref) and [`ZhuSingh`](@ref))
32+
converge towards the true entropy value for increasing time series length.
3533

36-
Input data are from a uniform 1D distribution ``U(0, 1)``, for which the true entropy is
37-
`ln(1 - 0) = 0`).
34+
Entropies.jl also provides entropy estimators based on
35+
[order statistics](https://en.wikipedia.org/wiki/Order_statistic). These estimators
36+
are only defined for scalar-valued vectors, in this example, so we compute these
37+
estimates separately, and add these estimators ([`Vasicek`](@ref), [`Ebrahimi`](@ref),
38+
[`AlizadehArghami`](@ref) and [`Correa`](@ref)) to the comparison.
39+
40+
Input data are from a normal 1D distribution ``\mathcal{N}(0, 1)``, for which the true
41+
entropy is `0.5*log(2π) + 0.5` nats when using natural logarithms.
3842

3943
```@example MAIN
4044
using Entropies
4145
using DynamicalSystemsBase, CairoMakie, Statistics
42-
using Distributions: Uniform, Normal
46+
nreps = 30
47+
Ns = [100:100:500; 1000:1000:10000]
48+
e = Shannon(; base = MathConstants.e)
4349
44-
# Define estimators
45-
base = MathConstants.e # shouldn't really matter here, because the target entropy is 0.
50+
# --------------------------
51+
# kNN estimators
52+
# --------------------------
4653
w = 0 # Theiler window of 0 (only exclude the point itself during neighbor searches)
47-
estimators = [
54+
knn_estimators = [
4855
# with k = 1, Kraskov is virtually identical to
4956
# Kozachenko-Leonenko, so pick a higher number of neighbors for Kraskov
5057
Kraskov(; k = 3, w),
5158
KozachenkoLeonenko(; w),
5259
Zhu(; k = 3, w),
5360
ZhuSingh(; k = 3, w),
5461
]
55-
labels = ["KozachenkoLeonenko", "Kraskov", "Zhu", "ZhuSingh"]
5662
5763
# Test each estimator `nreps` times over time series of varying length.
58-
nreps = 50
59-
Ns = [100:100:500; 1000:1000:10000]
60-
61-
Hs_uniform = [[zeros(nreps) for N in Ns] for e in estimators]
62-
for (i, e) in enumerate(estimators)
64+
Hs_uniform_knn = [[zeros(nreps) for N in Ns] for e in knn_estimators]
65+
for (i, est) in enumerate(knn_estimators)
6366
for j = 1:nreps
64-
pts = rand(Uniform(0, 1), maximum(Ns)) |> Dataset
67+
pts = randn(maximum(Ns)) |> Dataset
6568
for (k, N) in enumerate(Ns)
66-
Hs_uniform[i][k][j] = entropy(e, pts[1:N])
69+
Hs_uniform_knn[i][k][j] = entropy(e, est, pts[1:N])
6770
end
6871
end
6972
end
7073
71-
fig = Figure(resolution = (600, length(estimators) * 200))
72-
for (i, e) in enumerate(estimators)
73-
Hs = Hs_uniform[i]
74-
ax = Axis(fig[i,1]; ylabel = "h (nats)")
75-
lines!(ax, Ns, mean.(Hs); color = Cycled(i), label = labels[i])
76-
band!(ax, Ns, mean.(Hs) .+ std.(Hs), mean.(Hs) .- std.(Hs);
77-
color = (Main.COLORS[i], 0.5))
78-
ylims!(-0.25, 0.25)
79-
axislegend()
80-
end
81-
82-
fig
83-
```
84-
85-
## Differential entropy: order statistics estimators
86-
87-
Entropies.jl also provides entropy estimators based on
88-
[order statistics](https://en.wikipedia.org/wiki/Order_statistic). These estimators
89-
are only defined for scalar-valued vectors, so we pass the data as `Vector{<:Real}`s instead
90-
of `Dataset`s, as we did for the nearest-neighbor estimators above.
74+
# --------------------------
75+
# Order statistic estimators
76+
# --------------------------
9177
92-
Here, we show how the [`Vasicek`](@ref), [`Ebrahimi`](@ref), [`AlizadehArghami`](@ref)
93-
and [`Correa`](@ref) direct [`Shannon`](@ref) entropy estimators, with increasing sample size,
94-
approach zero for samples from a uniform distribution on `[0, 1]`. The true entropy value in
95-
nats for this distribution is `ln(1 - 0) = 0`.
96-
97-
```@example MAIN
98-
using Entropies
99-
using Statistics
100-
using Distributions: Uniform
101-
using CairoMakie
102-
103-
# Define estimators
104-
base = MathConstants.e # shouldn't really matter here, because the target entropy is 0.
105-
# just provide types here, they are instantiated inside the loop
106-
estimators = [Vasicek, Ebrahimi, AlizadehArghami, Correa]
107-
labels = ["Vasicek", "Ebrahimi", "AlizadehArghami", "Correa"]
108-
109-
# Test each estimator `nreps` times over time series of varying length.
110-
Ns = [100:100:500; 1000:1000:10000]
111-
nreps = 30
112-
113-
Hs_uniform = [[zeros(nreps) for N in Ns] for e in estimators]
114-
for (i, e) in enumerate(estimators)
78+
# Just provide types here, they are instantiated inside the loop
79+
estimators_os = [Vasicek, Ebrahimi, AlizadehArghami, Correa]
80+
Hs_uniform_os = [[zeros(nreps) for N in Ns] for e in estimators_os]
81+
for (i, est_os) in enumerate(estimators_os)
11582
for j = 1:nreps
116-
pts = rand(Uniform(0, 1), maximum(Ns)) # raw timeseries, not a `Dataset`
83+
pts = randn(maximum(Ns)) # raw timeseries, not a `Dataset`
11784
for (k, N) in enumerate(Ns)
11885
m = floor(Int, N / 100) # Scale `m` to timeseries length
119-
est = e(; m, base) # Instantiate estimator with current `m`
120-
Hs_uniform[i][k][j] = entropy(est, pts[1:N])
86+
est = est_os(; m) # Instantiate estimator with current `m`
87+
Hs_uniform_os[i][k][j] = entropy(e, est, pts[1:N])
12188
end
12289
end
12390
end
12491
125-
fig = Figure(resolution = (600, length(estimators) * 200))
126-
for (i, e) in enumerate(estimators)
127-
Hs = Hs_uniform[i]
92+
# -------------
93+
# Plot results
94+
# -------------
95+
fig = Figure(resolution = (700, 8 * 200))
96+
labels_knn = ["KozachenkoLeonenko", "Kraskov", "Zhu", "ZhuSingh"]
97+
labels_os = ["Vasicek", "Ebrahimi", "AlizadehArghami", "Correa"]
98+
99+
for (i, e) in enumerate(knn_estimators)
100+
Hs = Hs_uniform_knn[i]
128101
ax = Axis(fig[i,1]; ylabel = "h (nats)")
129-
lines!(ax, Ns, mean.(Hs); color = Cycled(i), label = labels[i])
130-
band!(ax, Ns, mean.(Hs) .+ std.(Hs), mean.(Hs) .- std.(Hs);
131-
color = (Main.COLORS[i], 0.5))
132-
ylims!(-0.25, 0.25)
102+
lines!(ax, Ns, mean.(Hs); color = Cycled(i), label = labels_knn[i])
103+
band!(ax, Ns, mean.(Hs) .+ std.(Hs), mean.(Hs) .- std.(Hs); alpha = 0.5,
104+
color = (Main.COLORS[i], 0.5))
105+
hlines!(ax, [(0.5*log(2π) + 0.5)], color = :black, lw = 5, linestyle = :dash)
106+
107+
ylims!(1.2, 1.6)
108+
axislegend()
109+
end
110+
111+
for (i, e) in enumerate(estimators_os)
112+
Hs = Hs_uniform_os[i]
113+
ax = Axis(fig[i + length(knn_estimators),1]; ylabel = "h (nats)")
114+
lines!(ax, Ns, mean.(Hs); color = Cycled(i), label = labels_os[i])
115+
band!(ax, Ns, mean.(Hs) .+ std.(Hs), mean.(Hs) .- std.(Hs), alpha = 0.5,
116+
color = (Main.COLORS[i], 0.5))
117+
hlines!(ax, [(0.5*log(2π) + 0.5)], color = :black, lw = 5, linestyle = :dash)
118+
ylims!(1.2, 1.6)
133119
axislegend()
134120
end
135121
136122
fig
137123
```
138124

139-
As for the nearest neighbor estimators, both estimators also approach the
140-
true entropy value for this example, but is negatively biased for small sample sizes.
125+
All estimators approach the true differential entropy, but those based on order statistics
126+
are negatively biased for small sample sizes.
141127

142128
## Discrete entropy: permutation entropy
143129

@@ -315,6 +301,7 @@ using Entropies
315301
using DynamicalSystemsBase
316302
using Random
317303
using CairoMakie
304+
using Distributions: Normal
318305
319306
n = 1000
320307
ts = 1:n

src/entropies/estimators/nearest_neighbors/KozachenkoLeonenko.jl

Lines changed: 22 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,36 +2,49 @@ export KozachenkoLeonenko
22

33
"""
44
KozachenkoLeonenko <: EntropyEstimator
5-
KozachenkoLeonenko(; k::Int = 1, w::Int = 1, base = 2)
5+
KozachenkoLeonenko(; k::Int = 1, w::Int = 1)
66
77
The `KozachenkoLeonenko` estimator computes the [`Shannon`](@ref) differential
8-
[`entropy`](@ref) of `x` (a multi-dimensional `Dataset`) to the given `base`, based on
9-
nearest neighbor searches using the method from Kozachenko & Leonenko
10-
(1987)[^KozachenkoLeonenko1987], as described in Charzyńska and Gambin[^Charzyńska2016].
8+
[`entropy`](@ref) of `x` (a multi-dimensional `Dataset`).
9+
10+
## Description
11+
12+
Assume we have samples ``\\{\\bf{x}_1, \\bf{x}_2, \\ldots, \\bf{x}_N \\}`` from a
13+
continuous random variable ``X \\in \\mathbb{R}^d`` with support ``\\mathcal{X}`` and
14+
density function``f : \\mathbb{R}^d \\to \\mathbb{R}``. `KozachenkoLeonenko` estimates
15+
the [Shannon](@ref) differential entropy
16+
17+
```math
18+
H(X) = \\int_{\\mathcal{X}} f(x) \\log f(x) dx = \\mathbb{E}[-\\log(f(X))]
19+
```
20+
21+
using the nearest neighbor method from Kozachenko &
22+
Leonenko (1987)[^KozachenkoLeonenko1987], as described in Charzyńska and
23+
Gambin[^Charzyńska2016].
1124
1225
`w` is the Theiler window, which determines if temporal neighbors are excluded
1326
during neighbor searches (defaults to `0`, meaning that only the point itself is excluded
1427
when searching for neighbours).
1528
1629
In contrast to [`Kraskov`](@ref), this estimator uses only the *closest* neighbor.
1730
18-
See also: [`entropy`](@ref).
31+
32+
See also: [`entropy`](@ref), [`Kraskov`](@ref), [`EntropyEstimator`](@ref).
1933
2034
[^Charzyńska2016]: Charzyńska, A., & Gambin, A. (2016). Improvement of the k-NN entropy
2135
estimator with applications in systems biology. Entropy, 18(1), 13.
2236
[^KozachenkoLeonenko1987]: Kozachenko, L. F., & Leonenko, N. N. (1987). Sample estimate of
2337
the entropy of a random vector. Problemy Peredachi Informatsii, 23(2), 9-16.
2438
"""
25-
@Base.kwdef struct KozachenkoLeonenko{B} <: EntropyEstimator
39+
@Base.kwdef struct KozachenkoLeonenko <: EntropyEstimator
2640
w::Int = 1
27-
base::B = 2
2841
end
2942

3043
function entropy(e::Renyi, est::KozachenkoLeonenko, x::AbstractDataset{D, T}) where {D, T}
3144
e.q == 1 || throw(ArgumentError(
3245
"Renyi entropy with q = $(e.q) not implemented for $(typeof(est)) estimator"
3346
))
34-
(; w, base) = est
47+
(; w) = est
3548

3649
N = length(x)
3750
ρs = maximum_neighbor_distances(x, w, 1)
@@ -40,5 +53,5 @@ function entropy(e::Renyi, est::KozachenkoLeonenko, x::AbstractDataset{D, T}) wh
4053
log(MathConstants.e, ball_volume(D)) +
4154
MathConstants.eulergamma +
4255
log(MathConstants.e, N - 1)
43-
return h / log(base, MathConstants.e) # Convert to target unit
56+
return h / log(e.base, MathConstants.e) # Convert to target unit
4457
end

src/entropies/estimators/nearest_neighbors/Kraskov.jl

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,38 +2,48 @@ export Kraskov
22

33
"""
44
Kraskov <: EntropyEstimator
5-
Kraskov(; k::Int = 1, w::Int = 1, base = 2)
5+
Kraskov(; k::Int = 1, w::Int = 1)
66
77
The `Kraskov` estimator computes the [`Shannon`](@ref) differential [`entropy`](@ref) of `x`
8-
(a multi-dimensional `Dataset`) to the given `base`, using the `k`-th nearest neighbor
8+
(a multi-dimensional `Dataset`) using the `k`-th nearest neighbor
99
searches method from [^Kraskov2004].
1010
1111
`w` is the Theiler window, which determines if temporal neighbors are excluded
1212
during neighbor searches (defaults to `0`, meaning that only the point itself is excluded
1313
when searching for neighbours).
1414
15-
See also: [`entropy`](@ref), [`KozachenkoLeonenko`](@ref).
15+
## Description
16+
17+
Assume we have samples ``\\{\\bf{x}_1, \\bf{x}_2, \\ldots, \\bf{x}_N \\}`` from a
18+
continuous random variable ``X \\in \\mathbb{R}^d`` with support ``\\mathcal{X}`` and
19+
density function``f : \\mathbb{R}^d \\to \\mathbb{R}``. `Kraskov` estimates the
20+
[Shannon](@ref) differential entropy
21+
22+
```math
23+
H(X) = \\int_{\\mathcal{X}} f(x) \\log f(x) dx = \\mathbb{E}[-\\log(f(X))].
24+
```
25+
26+
See also: [`entropy`](@ref), [`KozachenkoLeonenko`](@ref), [`EntropyEstimator`](@ref).
1627
1728
[^Kraskov2004]:
1829
Kraskov, A., Stögbauer, H., & Grassberger, P. (2004).
1930
Estimating mutual information. Physical review E, 69(6), 066138.
2031
"""
21-
Base.@kwdef struct Kraskov{B} <: EntropyEstimator
32+
Base.@kwdef struct Kraskov <: EntropyEstimator
2233
k::Int = 1
2334
w::Int = 1
24-
base::B = 2
2535
end
2636

2737
function entropy(e::Renyi, est::Kraskov, x::AbstractDataset{D, T}) where {D, T}
2838
e.q == 1 || throw(ArgumentError(
2939
"Renyi entropy with q = $(e.q) not implemented for $(typeof(est)) estimator"
3040
))
31-
(; k, w, base) = est
41+
(; k, w) = est
3242
N = length(x)
3343
ρs = maximum_neighbor_distances(x, w, k)
3444
# The estimated entropy has "unit" [nats]
3545
h = -digamma(k) + digamma(N) +
3646
log(MathConstants.e, ball_volume(D)) +
3747
D/N*sum(log.(MathConstants.e, ρs))
38-
return h / log(base, MathConstants.e) # Convert to target unit
48+
return h / log(e.base, MathConstants.e) # Convert to target unit
3949
end

src/entropies/estimators/nearest_neighbors/Zhu.jl

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,27 @@ export Zhu
44
Zhu <: EntropyEstimator
55
Zhu(k = 1, w = 0)
66
7-
The `Zhu` estimator (Zhu et al., 2015)[^Zhu2015] computes the [`Shannon`](@ref)
8-
differential [`entropy`](@ref) of `x` (a multi-dimensional `Dataset`), by
9-
approximating probabilities within hyperrectangles surrounding each point `xᵢ ∈ x` using
10-
using `k` nearest neighbor searches.
7+
The `Zhu` estimator (Zhu et al., 2015)[^Zhu2015] is an extension to
8+
[`KozachenkoLeonenko`](@ref), and computes the [`Shannon`](@ref)
9+
differential [`entropy`](@ref) of `x` (a multi-dimensional `Dataset`).
1110
12-
`w` is the Theiler window, which determines if temporal neighbors are excluded
13-
during neighbor searches (defaults to `0`, meaning that only the point itself is excluded
14-
when searching for neighbours).
11+
## Description
1512
16-
This estimator is an extension to [`KozachenkoLeonenko`](@ref).
13+
Assume we have samples ``\\{\\bf{x}_1, \\bf{x}_2, \\ldots, \\bf{x}_N \\}`` from a
14+
continuous random variable ``X \\in \\mathbb{R}^d`` with support ``\\mathcal{X}`` and
15+
density function``f : \\mathbb{R}^d \\to \\mathbb{R}``. `Zhu` estimates the [Shannon](@ref)
16+
differential entropy
1717
18-
See also: [`entropy`](@ref).
18+
```math
19+
H(X) = \\int_{\\mathcal{X}} f(x) \\log f(x) dx = \\mathbb{E}[-\\log(f(X))]
20+
```
21+
22+
by approximating densities within hyperrectangles surrounding each point `xᵢ ∈ x` using
23+
using `k` nearest neighbor searches. `w` is the Theiler window, which determines if
24+
temporal neighbors are excluded during neighbor searches (defaults to `0`, meaning that
25+
only the point itself is excluded when searching for neighbours).
26+
27+
See also: [`entropy`](@ref), [`KozachenkoLeonenko`](@ref), [`EntropyEstimator`](@ref).
1928
2029
[^Zhu2015]:
2130
Zhu, J., Bellanger, J. J., Shu, H., & Le Bouquin Jeannès, R. (2015). Contribution to

src/entropies/estimators/nearest_neighbors/ZhuSingh.jl

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,17 @@ export ZhuSingh
1212
The `ZhuSingh` estimator (Zhu et al., 2015)[^Zhu2015] computes the [`Shannon`](@ref)
1313
differential [`entropy`](@ref) of `x` (a multi-dimensional `Dataset`).
1414
15+
## Description
16+
17+
Assume we have samples ``\\{\\bf{x}_1, \\bf{x}_2, \\ldots, \\bf{x}_N \\}`` from a
18+
continuous random variable ``X \\in \\mathbb{R}^d`` with support ``\\mathcal{X}`` and
19+
density function``f : \\mathbb{R}^d \\to \\mathbb{R}``. `ZhuSingh` estimates the
20+
[Shannon](@ref) differential entropy
21+
22+
```math
23+
H(X) = \\int_{\\mathcal{X}} f(x) \\log f(x) dx = \\mathbb{E}[-\\log(f(X))].
24+
```
25+
1526
Like [`Zhu`](@ref), this estimator approximates probabilities within hyperrectangles
1627
surrounding each point `xᵢ ∈ x` using using `k` nearest neighbor searches. However,
1728
it also considers the number of neighbors falling on the borders of these hyperrectangles.

0 commit comments

Comments
 (0)