@@ -25,119 +25,105 @@ ax.zticklabelsvisible = false
25
25
fig
26
26
```
27
27
28
- ## Differential entropy: nearest neighbors estimators
28
+ ## Differential entropy: estimator comparison
29
29
30
- Here, we reproduce Figure 1 in Charzyńska & Gambin (2016)[ ^ Charzyńska2016 ] . Their example
31
- demonstrates how the [ ` Kraskov ` ] ( @ref ) and [ ` KozachenkoLeonenko ` ] ( @ref ) nearest neighbor
32
- based estimators converge towards the true entropy value for increasing time series length.
33
- We extend their example with [ ` Zhu ` ] ( @ref ) and [ ` ZhuSingh ` ] ( @ref ) estimators, which are also
34
- based on nearest neighbor searches.
30
+ Here, we compare how the nearest neighbor differential entropy estimators
31
+ ([ ` Kraskov ` ] ( @ref ) , [ ` KozachenkoLeonenko ` ] ( @ref ) , [ ` Zhu ` ] ( @ref ) and [ ` ZhuSingh ` ] ( @ref ) )
32
+ converge towards the true entropy value for increasing time series length.
35
33
36
- Input data are from a uniform 1D distribution `` U(0, 1) `` , for which the true entropy is
37
- ` ln(1 - 0) = 0 ` ).
34
+ Entropies.jl also provides entropy estimators based on
35
+ [ order statistics] ( https://en.wikipedia.org/wiki/Order_statistic ) . These estimators
36
+ are only defined for scalar-valued vectors, in this example, so we compute these
37
+ estimates separately, and add these estimators ([ ` Vasicek ` ] ( @ref ) , [ ` Ebrahimi ` ] ( @ref ) ,
38
+ [ ` AlizadehArghami ` ] ( @ref ) and [ ` Correa ` ] ( @ref ) ) to the comparison.
39
+
40
+ Input data are from a normal 1D distribution `` \mathcal{N}(0, 1) `` , for which the true
41
+ entropy is ` 0.5*log(2π) + 0.5 ` nats when using natural logarithms.
38
42
39
43
``` @example MAIN
40
44
using Entropies
41
45
using DynamicalSystemsBase, CairoMakie, Statistics
42
- using Distributions: Uniform, Normal
46
+ nreps = 30
47
+ Ns = [100:100:500; 1000:1000:10000]
48
+ e = Shannon(; base = MathConstants.e)
43
49
44
- # Define estimators
45
- base = MathConstants.e # shouldn't really matter here, because the target entropy is 0.
50
+ # --------------------------
51
+ # kNN estimators
52
+ # --------------------------
46
53
w = 0 # Theiler window of 0 (only exclude the point itself during neighbor searches)
47
- estimators = [
54
+ knn_estimators = [
48
55
# with k = 1, Kraskov is virtually identical to
49
56
# Kozachenko-Leonenko, so pick a higher number of neighbors for Kraskov
50
57
Kraskov(; k = 3, w),
51
58
KozachenkoLeonenko(; w),
52
59
Zhu(; k = 3, w),
53
60
ZhuSingh(; k = 3, w),
54
61
]
55
- labels = ["KozachenkoLeonenko", "Kraskov", "Zhu", "ZhuSingh"]
56
62
57
63
# Test each estimator `nreps` times over time series of varying length.
58
- nreps = 50
59
- Ns = [100:100:500; 1000:1000:10000]
60
-
61
- Hs_uniform = [[zeros(nreps) for N in Ns] for e in estimators]
62
- for (i, e) in enumerate(estimators)
64
+ Hs_uniform_knn = [[zeros(nreps) for N in Ns] for e in knn_estimators]
65
+ for (i, est) in enumerate(knn_estimators)
63
66
for j = 1:nreps
64
- pts = rand(Uniform(0, 1), maximum(Ns)) |> Dataset
67
+ pts = randn( maximum(Ns)) |> Dataset
65
68
for (k, N) in enumerate(Ns)
66
- Hs_uniform [i][k][j] = entropy(e, pts[1:N])
69
+ Hs_uniform_knn [i][k][j] = entropy(e, est , pts[1:N])
67
70
end
68
71
end
69
72
end
70
73
71
- fig = Figure(resolution = (600, length(estimators) * 200))
72
- for (i, e) in enumerate(estimators)
73
- Hs = Hs_uniform[i]
74
- ax = Axis(fig[i,1]; ylabel = "h (nats)")
75
- lines!(ax, Ns, mean.(Hs); color = Cycled(i), label = labels[i])
76
- band!(ax, Ns, mean.(Hs) .+ std.(Hs), mean.(Hs) .- std.(Hs);
77
- color = (Main.COLORS[i], 0.5))
78
- ylims!(-0.25, 0.25)
79
- axislegend()
80
- end
81
-
82
- fig
83
- ```
84
-
85
- ## Differential entropy: order statistics estimators
86
-
87
- Entropies.jl also provides entropy estimators based on
88
- [ order statistics] ( https://en.wikipedia.org/wiki/Order_statistic ) . These estimators
89
- are only defined for scalar-valued vectors, so we pass the data as ` Vector{<:Real} ` s instead
90
- of ` Dataset ` s, as we did for the nearest-neighbor estimators above.
74
+ # --------------------------
75
+ # Order statistic estimators
76
+ # --------------------------
91
77
92
- Here, we show how the [ ` Vasicek ` ] ( @ref ) , [ ` Ebrahimi ` ] ( @ref ) , [ ` AlizadehArghami ` ] ( @ref )
93
- and [ ` Correa ` ] ( @ref ) direct [ ` Shannon ` ] ( @ref ) entropy estimators, with increasing sample size,
94
- approach zero for samples from a uniform distribution on ` [0, 1] ` . The true entropy value in
95
- nats for this distribution is ` ln(1 - 0) = 0 ` .
96
-
97
- ``` @example MAIN
98
- using Entropies
99
- using Statistics
100
- using Distributions: Uniform
101
- using CairoMakie
102
-
103
- # Define estimators
104
- base = MathConstants.e # shouldn't really matter here, because the target entropy is 0.
105
- # just provide types here, they are instantiated inside the loop
106
- estimators = [Vasicek, Ebrahimi, AlizadehArghami, Correa]
107
- labels = ["Vasicek", "Ebrahimi", "AlizadehArghami", "Correa"]
108
-
109
- # Test each estimator `nreps` times over time series of varying length.
110
- Ns = [100:100:500; 1000:1000:10000]
111
- nreps = 30
112
-
113
- Hs_uniform = [[zeros(nreps) for N in Ns] for e in estimators]
114
- for (i, e) in enumerate(estimators)
78
+ # Just provide types here, they are instantiated inside the loop
79
+ estimators_os = [Vasicek, Ebrahimi, AlizadehArghami, Correa]
80
+ Hs_uniform_os = [[zeros(nreps) for N in Ns] for e in estimators_os]
81
+ for (i, est_os) in enumerate(estimators_os)
115
82
for j = 1:nreps
116
- pts = rand(Uniform(0, 1), maximum(Ns)) # raw timeseries, not a `Dataset`
83
+ pts = randn( maximum(Ns)) # raw timeseries, not a `Dataset`
117
84
for (k, N) in enumerate(Ns)
118
85
m = floor(Int, N / 100) # Scale `m` to timeseries length
119
- est = e (; m, base ) # Instantiate estimator with current `m`
120
- Hs_uniform [i][k][j] = entropy(est, pts[1:N])
86
+ est = est_os (; m) # Instantiate estimator with current `m`
87
+ Hs_uniform_os [i][k][j] = entropy(e, est, pts[1:N])
121
88
end
122
89
end
123
90
end
124
91
125
- fig = Figure(resolution = (600, length(estimators) * 200))
126
- for (i, e) in enumerate(estimators)
127
- Hs = Hs_uniform[i]
92
+ # -------------
93
+ # Plot results
94
+ # -------------
95
+ fig = Figure(resolution = (700, 8 * 200))
96
+ labels_knn = ["KozachenkoLeonenko", "Kraskov", "Zhu", "ZhuSingh"]
97
+ labels_os = ["Vasicek", "Ebrahimi", "AlizadehArghami", "Correa"]
98
+
99
+ for (i, e) in enumerate(knn_estimators)
100
+ Hs = Hs_uniform_knn[i]
128
101
ax = Axis(fig[i,1]; ylabel = "h (nats)")
129
- lines!(ax, Ns, mean.(Hs); color = Cycled(i), label = labels[i])
130
- band!(ax, Ns, mean.(Hs) .+ std.(Hs), mean.(Hs) .- std.(Hs);
131
- color = (Main.COLORS[i], 0.5))
132
- ylims!(-0.25, 0.25)
102
+ lines!(ax, Ns, mean.(Hs); color = Cycled(i), label = labels_knn[i])
103
+ band!(ax, Ns, mean.(Hs) .+ std.(Hs), mean.(Hs) .- std.(Hs); alpha = 0.5,
104
+ color = (Main.COLORS[i], 0.5))
105
+ hlines!(ax, [(0.5*log(2π) + 0.5)], color = :black, lw = 5, linestyle = :dash)
106
+
107
+ ylims!(1.2, 1.6)
108
+ axislegend()
109
+ end
110
+
111
+ for (i, e) in enumerate(estimators_os)
112
+ Hs = Hs_uniform_os[i]
113
+ ax = Axis(fig[i + length(knn_estimators),1]; ylabel = "h (nats)")
114
+ lines!(ax, Ns, mean.(Hs); color = Cycled(i), label = labels_os[i])
115
+ band!(ax, Ns, mean.(Hs) .+ std.(Hs), mean.(Hs) .- std.(Hs), alpha = 0.5,
116
+ color = (Main.COLORS[i], 0.5))
117
+ hlines!(ax, [(0.5*log(2π) + 0.5)], color = :black, lw = 5, linestyle = :dash)
118
+ ylims!(1.2, 1.6)
133
119
axislegend()
134
120
end
135
121
136
122
fig
137
123
```
138
124
139
- As for the nearest neighbor estimators, both estimators also approach the
140
- true entropy value for this example, but is negatively biased for small sample sizes.
125
+ All estimators approach the true differential entropy, but those based on order statistics
126
+ are negatively biased for small sample sizes.
141
127
142
128
## Discrete entropy: permutation entropy
143
129
@@ -315,6 +301,7 @@ using Entropies
315
301
using DynamicalSystemsBase
316
302
using Random
317
303
using CairoMakie
304
+ using Distributions: Normal
318
305
319
306
n = 1000
320
307
ts = 1:n
0 commit comments