Skip to content

Commit 27a2bb8

Browse files
authored
Azadkia-Chatterjee coefficient (#385)
* reproducible test for chatterjee coefficient too * be explicit with keyword * Azadkia-Chatterjee coefficient * Fix tests * Fix header * Add missing reference * Show method * More tests based on the original paper
1 parent a0014fc commit 27a2bb8

21 files changed

+434
-13
lines changed

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ name = "Associations"
22
uuid = "614afb3a-e278-4863-8805-9959372b9ec2"
33
authors = ["Kristian Agasøster Haaga <[email protected]>", "Tor Einar Møller <[email protected]>", "George Datseris <[email protected]>"]
44
repo = "https://github.com/kahaaga/Associations.jl.git"
5-
version = "4.1.0"
5+
version = "4.2.0"
66

77
[deps]
88
Accessors = "7d9f7c33-5ae7-4f3b-8dc6-eff91059b697"

changelog.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,10 @@
22

33
From version v4.0 onwards, this package has been renamed to to Associations.jl.
44

5+
# 4.2
6+
7+
- New association measure: `AzadkiaChatterjeeCoefficient`.
8+
59
# 4.1
610

711
- New association measure: `ChatterjeeCorrelation`.

docs/refs.bib

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1322,4 +1322,15 @@ @article{Dette2013
13221322
pages={21--41},
13231323
year={2013},
13241324
publisher={Wiley Online Library}
1325+
}
1326+
1327+
@article{Azadkia2021,
1328+
title={A simple measure of conditional dependence},
1329+
author={Azadkia, Mona and Chatterjee, Sourav},
1330+
journal={The Annals of Statistics},
1331+
volume={49},
1332+
number={6},
1333+
pages={3070--3102},
1334+
year={2021},
1335+
publisher={Institute of Mathematical Statistics}
13251336
}

docs/src/associations.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,7 @@ PearsonCorrelation
115115
PartialCorrelation
116116
DistanceCorrelation
117117
ChatterjeeCorrelation
118+
AzadkiaChatterjeeCoefficient
118119
```
119120

120121
## [Cross-map measures](@id cross_map_api)

docs/src/examples/examples_associations.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1724,5 +1724,22 @@ z = rand(rng, 1:15, 120) .* sin.(w) # introduce some dependence
17241724
association(ChatterjeeCorrelation(handle_ties = true), w, z)
17251725
```
17261726

1727+
## [[`AzadkiaChatterjeeCoefficient`](@ref)](@id example_AzadkiaChatterjeeCoefficient)
17271728

17281729

1730+
```@example example_AzadkiaChatterjeeCoefficient
1731+
using Associations
1732+
using Random; rng = Xoshiro(1234);
1733+
x = rand(rng, 120)
1734+
y = rand(rng, 120) .* x
1735+
z = rand(rng, 120) .+ y
1736+
```
1737+
1738+
For the variables above, where `x → y → z`, we expect stronger assocation between `x` and `y` than
1739+
between `x` and `z`. We also expect the strength of the association between `x` and `z` to drop when conditioning on `y`, because `y` is the variable that connects `x` and `z`.
1740+
1741+
```@example example_AzadkiaChatterjeeCoefficient
1742+
m = AzadkiaChatterjeeCoefficient(theiler = 0) # only exclude self-neighbors
1743+
association(m, x, y), association(m, x, z), association(m, x, z, y)
1744+
```
1745+

docs/src/examples/examples_independence.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,58 @@ independence(test, y, z)
138138

139139
The test clearly picks up on the functional dependence.
140140

141+
142+
### [Azadkia-Chatterjee coefficient](@id example_SurrogateAssociationTest_AzadkiaChatterjeeCoefficient)
143+
144+
```@example example_SurrogateAssociationTest_AzadkiaChatterjeeCoefficient
145+
using Associations
146+
using Random; rng = Xoshiro(1234)
147+
n = 1000
148+
# Some categorical variables (we add a small amount of noise to avoid duplicate points
149+
# during neighbor searches)
150+
x = rand(rng, 1.0:50.0, n) .+ rand(n) .* 1e-8
151+
y = rand(rng, 1.0:50.0, n) .+ rand(n) .* 1e-8
152+
test = SurrogateAssociationTest(AzadkiaChatterjeeCoefficient(), nshuffles = 19)
153+
independence(test, x, y)
154+
```
155+
156+
As expected, the test indicates that we can't reject independence. What happens if we introduce
157+
a third variable that depends on `y`?
158+
159+
```@example example_SurrogateAssociationTest_AzadkiaChatterjeeCoefficient
160+
z = rand(rng, 1.0:20.0, n) .* y
161+
independence(test, y, z)
162+
```
163+
164+
The test clearly picks up on the functional dependence. But what about conditioning?
165+
Let's define three variables where `x → y → z`. When then expect significant association between `x` and `y`, possibly between `x` and `z` (depending on how strong the intermediate connection is), and
166+
non-significant association between `x` and `z` if conditioning on `y` (since `y` is the variable
167+
connecting `x` and `z`.) The Azadkia-Chatterjee coefficient also should be able to verify these
168+
claims.
169+
170+
```@example example_SurrogateAssociationTest_AzadkiaChatterjeeCoefficient
171+
x = rand(rng, 120)
172+
y = rand(rng, 120) .* x
173+
z = rand(rng, 120) .+ y
174+
independence(test, x, y)
175+
```
176+
177+
The direct association between `x` and `y` is detected.
178+
179+
```@example example_SurrogateAssociationTest_AzadkiaChatterjeeCoefficient
180+
independence(test, x, z)
181+
```
182+
183+
The indirect association between `x` and `z` is also detected.
184+
185+
```@example example_SurrogateAssociationTest_AzadkiaChatterjeeCoefficient
186+
independence(test, x, z, y)
187+
```
188+
189+
We can't reject independence between `x` and `z` when taking into consideration
190+
`y`, as expected.
191+
192+
141193
### [Distance correlation](@id example_SurrogateAssociationTest_DistanceCorrelation)
142194

143195
```@example
@@ -393,3 +445,28 @@ The same goes for variables one step up the chain:
393445
```@example example_LocalPermutationTest
394446
independence(test, y, w, z)
395447
```
448+
449+
450+
### [[`AzadkiaChatterjeeCoefficient`](@ref)](@id example_LocalPermutationTest_AzadkiaChatterjeeCoefficient)
451+
452+
```@example example_LocalPermutationTest_AzadkiaChatterjeeCoefficient
453+
using Associations
454+
using Random; rng = Xoshiro(1234)
455+
n = 300
456+
# Some categorical variables (we add a small amount of noise to avoid duplicate points
457+
# during neighbor searches)
458+
test = LocalPermutationTest(AzadkiaChatterjeeCoefficient(), nshuffles = 19)
459+
x = rand(rng, n)
460+
y = rand(rng, n) .* x
461+
z = rand(rng, n) .+ y
462+
```
463+
464+
Let's define three variables where `x → y → z`. We expect a
465+
non-significant association between `x` and `z` if conditioning on `y` (since `y` is the variable
466+
connecting `x` and `z`.)
467+
468+
```@example example_LocalPermutationTest_AzadkiaChatterjeeCoefficient
469+
independence(test, x, z, y)
470+
```
471+
472+
The test verifies our expectation.

src/core.jl

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ an [`AssociationMeasureEstimator`](@ref) to compute.
3838
| Correlation | [`PartialCorrelation`](@ref) | ✓ | ✓ |
3939
| Correlation | [`DistanceCorrelation`](@ref) | ✓ | ✓ |
4040
| Correlation | [`ChatterjeeCorrelation`](@ref) | ✓ | ✖ |
41+
| Correlation | [`AzadkiaChatterjeeCoefficient`](@ref) | ✓ | ✓ |
4142
| Closeness | [`SMeasure`](@ref) | ✓ | ✖ |
4243
| Closeness | [`HMeasure`](@ref) | ✓ | ✖ |
4344
| Closeness | [`MMeasure`](@ref) | ✓ | ✖ |
@@ -93,6 +94,7 @@ Concrete subtypes are given as input to [`association`](@ref).
9394
| [`DistanceCorrelation`](@ref) | Not required |
9495
| [`PartialCorrelation`](@ref) | Not required |
9596
| [`ChatterjeeCorrelation`](@ref) | Not required |
97+
| [`AzadkiaChatterjeeCoefficient`](@ref) | Not required |
9698
| [`SMeasure`](@ref) | Not required |
9799
| [`HMeasure`](@ref) | Not required |
98100
| [`MMeasure`](@ref) | Not required |
@@ -125,7 +127,6 @@ Concrete subtypes are given as input to [`association`](@ref).
125127
| [`KLDivergence`](@ref) | [`JointProbabilities`](@ref) |
126128
| [`RenyiDivergence`](@ref) | [`JointProbabilities`](@ref) |
127129
| [`VariationDistance`](@ref) | [`JointProbabilities`](@ref) |
128-
129130
"""
130131
abstract type AssociationMeasureEstimator end
131132

src/independence_tests/local_permutation/LocalPermutationTest.jl

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -74,13 +74,15 @@ instead of `Z` and we `I(X; Y)` and `Iₖ(X̂; Y)` instead of `I(X; Y | Z)` and
7474
7575
## Compatible measures
7676
77-
| Measure | Pairwise | Conditional | Requires `est` | Note |
78-
| ---------------------------------- | :------: | :---------: | :------------: | :-------------------------------------------------------------------------------------------------------------------------------: |
79-
| [`PartialCorrelation`](@ref) | ✖ | ✓ | No | |
80-
| [`DistanceCorrelation`](@ref) | ✖ | ✓ | No | |
81-
| [`CMIShannon`](@ref) | ✖ | ✓ | Yes | |
82-
| [`TEShannon`](@ref) | ✓ | ✓ | Yes | Pairwise tests not possible with `TransferEntropyEstimator`s, only lower-level estimators, e.g. `FPVP`, `GaussianMI` or `Kraskov` |
83-
| [`PartialMutualInformation`](@ref) | ✖ | ✓ | Yes | |
77+
| Measure | Pairwise | Conditional | Requires `est` | Note |
78+
| -------------------------------------- | :------: | :---------: | :------------: | :-------------------------------------------------------------------------------------------------------------------------------: |
79+
| [`PartialCorrelation`](@ref) | ✖ | ✓ | No | |
80+
| [`DistanceCorrelation`](@ref) | ✖ | ✓ | No | |
81+
| [`CMIShannon`](@ref) | ✖ | ✓ | Yes | |
82+
| [`TEShannon`](@ref) | ✓ | ✓ | Yes | Pairwise tests not possible with `TransferEntropyEstimator`s, only lower-level estimators, e.g. `FPVP`, `GaussianMI` or `Kraskov` |
83+
| [`PartialMutualInformation`](@ref) | ✖ | ✓ | Yes | |
84+
| [`AzadkiaChatterjeeCoefficient`](@ref) | ✖ | ✓ | No | |
85+
8486
8587
The `LocalPermutationTest` is only defined for conditional independence testing.
8688
Exceptions are for measures like [`TEShannon`](@ref), which use conditional
@@ -96,6 +98,8 @@ The nearest-neighbor approach in Runge (2018) can be reproduced by using the
9698
Conditional independence test using [`CMIShannon`](@ref)
9799
- [Example 2](@ref example_LocalPermutationTest_TEShannon)):
98100
Conditional independence test using [`TEShannon`](@ref)
101+
- [Example 3](@ref example_LocalPermutationTest_AzadkiaChatterjeeCoefficient):
102+
Conditional independence test using [`AzadkiaChatterjeeCoefficient`](@ref)
99103
"""
100104
struct LocalPermutationTest{M, C, R} <: IndependenceTest{M}
101105
est_or_measure::M
@@ -231,5 +235,6 @@ end
231235
function LocalPermutationTest(m::MultivariateInformationMeasure; kwargs...)
232236
throw(ArgumentError("You need to provide an estimator for the multivariate information measure $(typeof(m)), not only the definition."))
233237
end
234-
# TODO: fix this
238+
235239
include("transferentropy.jl")
240+
include("azadkia_chatterjee_coefficient.jl")
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
function independence(test::LocalPermutationTest{<:AzadkiaChatterjeeCoefficient}, x::AbstractVector, y, z)
2+
est_or_measure, nshuffles = test.est_or_measure, test.nshuffles
3+
# Make sure that the measure is compatible with the input data.
4+
verify_number_of_inputs_vars(est_or_measure, 3)
5+
6+
Y, Z = StateSpaceSet(y), StateSpaceSet(z)
7+
@assert length(x) == length(Y) == length(Z)
8+
= association(est_or_measure, x, Y, Z)
9+
Îs = permuted_Îs(x, Y, Z, est_or_measure, test)
10+
p = count(Î .<= Îs) / nshuffles
11+
return LocalPermutationTestResult(3, Î, Îs, p, nshuffles)
12+
end

src/independence_tests/surrogate/SurrogateAssociationTest.jl

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,10 @@ For each shuffle, `est_or_measure` is recomputed and the results are stored.
6363
[`CMIShannon`](@ref) test for conditional independence on categorical data.
6464
- [Example 6](@ref example_independence_MCR): [`MCR`](@ref) test for
6565
pairwise and conditional independence.
66+
- [Example 7](@ref example_SurrogateAssociationTest_ChatterjeeCorrelation).
67+
[`ChatterjeeCorrelation`](@ref) test for pairwise independence.
68+
- [Example 8](@ref example_SurrogateAssociationTest_AzadkiaChatterjeeCoefficient).
69+
[`AzadkiaChatterjeeCoefficient`](@ref) test for pairwise and conditional independence.
6670
"""
6771
struct SurrogateAssociationTest{E, R, S} <: IndependenceTest{E}
6872
est_or_measure::E
@@ -152,6 +156,7 @@ include("transferentropy.jl")
152156
include("crossmapping.jl")
153157
include("hlms_measure.jl")
154158
include("chatterjee_correlation.jl")
159+
include("azadkia_chatterjee_coefficient.jl")
155160

156161
# Input checks
157162
function SurrogateAssociationTest(measure::T) where T <: MultivariateInformationMeasure

0 commit comments

Comments
 (0)