Skip to content

Commit 36ac16f

Browse files
Merge pull request #175 from matteoceriscioli/mlingam
Add m-LiNGAM method for missing data scenarios
2 parents abac903 + c9941c6 commit 36ac16f

File tree

11 files changed

+1003
-1
lines changed

11 files changed

+1003
-1
lines changed

docs/reference/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ API Reference
2727
lim
2828
group_lingam
2929
group_direct_lingam
30+
missingness_lingam
3031
causal_effect
3132
utils
3233
tools
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
.. module:: lingam
2+
3+
mLiNGAM
4+
=============
5+
6+
.. autoclass:: mLiNGAM
7+
:members:
8+
:inherited-members:

docs/reference/utils.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,4 @@ utils
2424
.. autofunction:: get_common_edge_probabilities
2525
.. autofunction:: print_common_edge_directions
2626
.. autofunction:: make_dot_for_nan_probability_matrix
27+
.. autofunction:: bic_select_logistic_l1

docs/tutorial/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,3 +43,4 @@ Contents:
4343
evaluate_model_fit
4444
bootstrap_with_imputation
4545
high_dim_direct_lingam
46+
missingness_lingam
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
m-LiNGAM
2+
=========
3+
4+
Model
5+
-------------------
6+
7+
Missingness-LiNGAM (m-LiNGAM) extends the basic LiNGAM [1]_ model to handle datasets affected by missing values, including Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR) cases.
8+
It enables the identification of the true underlying causal structure and provides unbiased parameter estimates even when data are not fully observed.
9+
10+
The model combines the principles of LiNGAM and the graphical representation of missingness mechanisms using *missingness graphs* (m-graphs) [2]_.
11+
In this framework, variables can be fully observed or partially observed, and each partially observed variable is associated with a missingness mechanism and a proxy variable.
12+
13+
Let the set of variables be:
14+
15+
.. math::
16+
17+
V = V_o \cup V_m \cup U \cup V^* \cup R
18+
19+
where:
20+
21+
- :math:`V_o` are fully observed variables,
22+
- :math:`V_m` are partially observed variables,
23+
- :math:`U` are latent variables (here assumed empty),
24+
- :math:`V^*` are proxy variables (what is actually observed, corresponding to dataset columns with missing values)
25+
- :math:`R` are missingness mechanism.
26+
27+
The induced subgraph :math:`G[V_o \cup V_m]` follows a LiNGAM model, meaning that for every variable :math:`X_i \in (V_o \cup V_m)`:
28+
29+
.. math::
30+
31+
x_i = \sum_{k(j)<k(i)}b_{ij}x_j + e_i, \qquad e_i\sim \text{Non-Gaussian}(\cdot)
32+
33+
where :math:`i\in\{1,\dots,n\}\mapsto k(i)` denotes a causal order, and the non-gaussian error terms are independent.
34+
35+
The induced subgraph :math:`G[V_o \cup V_m \cup R]` follows a LiM model. The missingness mechanisms :math:`R_i \in R` follow a logistic model as for binary variables in LiM [3]_:
36+
37+
.. math::
38+
x_i = \mathbf 1\llbracket\sum_{k(j)<k(i)} b_{ij} x_j + e_i > 0\rrbracket, \qquad e_i \sim \text{Logistic}(0,1)
39+
40+
41+
Assumptions
42+
^^^^^^^^^^^^^^^^^^
43+
44+
The following assumptions are made to ensure identifiability:
45+
46+
#. No latent confounders (:math:`U = \emptyset`).
47+
#. No causal interactions between missingness mechanisms (:math:`R_i \notin Pa(R_j)` for all :math:`i \neq j`).
48+
#. No direct self-masking (:math:`X_i \notin Pa(R_i)` for any :math:`X_i \in V_m`).
49+
50+
Note that even if direct self-masking is not allowed, a partially observed variable can be an indirect cause (an ancestor) of its own missingness mechanism (indirect self-masking).
51+
Under these assumptions, m-LiNGAM guarantees identifiability of both the causal structure and parameters from observational data in the large-sample limit.
52+
53+
An example Python notebook demonstrating m-LiNGAM is available `here <https://github.com/cdt15/lingam/blob/master/examples/MissingnessLiNGAM.ipynb>`__.
54+
55+
References
56+
-------------------
57+
58+
.. [1] S. Shimizu, P. O. Hoyer, A. Hyvärinen, and A. J. Kerminen.
59+
*A Linear Non-Gaussian Acyclic Model for Causal Discovery.*
60+
Journal of Machine Learning Research, 7:2003–2030, 2006.
61+
62+
.. [2] K. Mohan, J. Pearl, and J. Tian.
63+
*Graphical Models for Inference with Missing Data.*
64+
Advances in Neural Information Processing Systems (NeurIPS), 2013.
65+
66+
.. [3] Y. Zeng, S. Shimizu, H. Matsui, and F. Sun.
67+
*Causal Discovery for Linear Mixed Data.*
68+
In Proceedings of the First Conference on Causal Learning and Reasoning (CLeaR 2022), PMLR 177, pp. 994–1009, 2022.

docs/tutorial/resit.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ RESIT [2]_ is an estimation algorithm for Additive Noise Model [1]_.
77

88
This method makes the following assumptions.
99

10-
#. Continouos variables
10+
#. Continuous variables
1111
#. Nonlinearity
1212
#. Additive noise
1313
#. Acyclicity

0 commit comments

Comments
 (0)