Skip to content

Commit be1a3a9

Browse files
MVPC (#137)
* Adding MVPC algorithm (mvpc) * Adding method to sample data with missing values (mvpc_gen_data)
1 parent d1c3a82 commit be1a3a9

File tree

25 files changed

+932
-39
lines changed

25 files changed

+932
-39
lines changed

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.9.0
1+
2.10.0

docs/render_data_docs.py

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -206,13 +206,29 @@ def info_to_small_table():
206206
module_str += "\n\n"
207207
module_str += content
208208
module_str += "\n\n"
209+
209210

210211
if p.name == "fixed_data":
211212
with open(p/"data_info.json") as json_data_file:
212213
fixed_data_info = json.load(json_data_file)
213214
module_str += fixed_data_to_table(fixed_data_info, p)
214215
module_str += "\n\n"
215-
216+
else:
217+
with open(s) as json_file:
218+
schema = json.load(json_file)
219+
220+
tmp = any(["description" in obj
221+
for prop, obj in schema["items"]["properties"].items()
222+
if prop != "id"])
223+
224+
if tmp:
225+
module_str += ".. rubric:: Some fields described \n"
226+
for prop, obj in sorted(schema["items"]["properties"].items()):
227+
if prop == "id":
228+
continue
229+
if "description" in obj:
230+
module_str += "* ``{}`` {} \n".format(prop, obj["description"])
231+
216232

217233
if dump != "":
218234
module_str += "\n\n"

docs/source/available_background_knowledge.rst

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,39 @@
11
.. _edge_constraints:
22

3-
Edge Constraints
3+
Edge constraints
44
---------------------------------------
55

66
Benchpress allows users to incorporate edge constraints to guide structure learning algorithms in several packages:
7-
**pcalg**, **bnlearn**, **tetrad**, **gobnilp**, and **bidag**. These constraints enable the inclusion of prior knowledge to refine
7+
**pcalg**, **mvpc**, **bnlearn**, **tetrad**, **gobnilp**, and **bidag**. These constraints enable the inclusion of prior knowledge to refine
88
the search space of causal graphs, improving the reliability of the inferred relationships. Users can specify **forbidden or
99
required edges**, **tiers for temporal ordering**, and **group-based constraints**.
1010

1111
The edge constraints should be defined in a JSON file located within the ``resources/constraints`` folder.
1212

1313
.. rubric:: Supported Constraints
1414

15-
+--------------------+---------------------+---------------------+--------------------+----------------------+-----------------------+
16-
| **Package** | **forbidden_edges** | **required_edges** | **tiers** | **forbidden_groups** | **required_groups** |
17-
+====================+=====================+=====================+====================+======================+=======================+
18-
| pcalg | X | X | N/A | N/A | N/A |
19-
+--------------------+---------------------+---------------------+--------------------+----------------------+-----------------------+
20-
| bnlearn | X | X | X | X | X |
21-
+--------------------+---------------------+---------------------+--------------------+----------------------+-----------------------+
22-
| tetrad | X | X | X | X | X |
23-
+--------------------+---------------------+---------------------+--------------------+----------------------+-----------------------+
24-
| gobnilp | X | X | X | X | X |
25-
+--------------------+---------------------+---------------------+--------------------+----------------------+-----------------------+
26-
| bidag | X | N/A | N/A | X | N/A |
27-
+--------------------+---------------------+---------------------+--------------------+----------------------+-----------------------+
15+
+-------------+---------------------+--------------------+-----------+----------------------+---------------------+
16+
| **Package** | **forbidden_edges** | **required_edges** | **tiers** | **forbidden_groups** | **required_groups** |
17+
+=============+=====================+====================+===========+======================+=====================+
18+
| mvpc | X | X | N/A | N/A | N/A |
19+
+-------------+---------------------+--------------------+-----------+----------------------+---------------------+
20+
| pcalg | X | X | N/A | N/A | N/A |
21+
+-------------+---------------------+--------------------+-----------+----------------------+---------------------+
22+
| bnlearn | X | X | X | X | X |
23+
+-------------+---------------------+--------------------+-----------+----------------------+---------------------+
24+
| tetrad | X | X | X | X | X |
25+
+-------------+---------------------+--------------------+-----------+----------------------+---------------------+
26+
| gobnilp | X | X | X | X | X |
27+
+-------------+---------------------+--------------------+-----------+----------------------+---------------------+
28+
| bidag | X | N/A | N/A | X | N/A |
29+
+-------------+---------------------+--------------------+-----------+----------------------+---------------------+
2830

2931
.. rubric:: Description
3032

3133
- ``forbidden_edges``: A list of directed edges that are explicitly prohibited from existing between specific nodes. Each edge is defined as a pair of nodes, where the first node cannot directly cause the second node.
3234
- ``required_edges``: A list of directed edges that are enforced between specific nodes. Each edge is defined as a pair of nodes, where the first node must directly cause the second node.
3335

34-
- *Note: For algorithms in the* **pcalg** *package, the above attributes only specify the presence or absence of edges and do not control their directionality.*
36+
- *Note: For algorithms in the* **pcalg** and **mvpc** *package, the above attributes only specify the presence or absence of edges and do not control their directionality.*
3537
- ``tiers``: Defines a temporal ordering of nodes across multiple levels (or) tiers. Nodes in one tier are constrained from causing nodes in any of the preceding tiers.
3638
- ``tier_settings``:
3739

docs/source/available_data.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Data
1313
data/fixed_data
1414
data/gcastle_iidsim
1515
data/iid
16+
data/mvpc_gen_data
1617
The available data modules are listed below.
1718

1819

@@ -41,6 +42,11 @@ The available data modules are listed below.
4142
-
4243
-
4344
- :ref:`iid`
45+
* - Missing data generation
46+
- `DAG <https://en.wikipedia.org/wiki/Directed_acyclic_graph>`__
47+
- `MVPC <https://github.com/felixleopoldo/MVPC>`__
48+
- d901361
49+
- :ref:`mvpc_gen_data`
4450

4551

4652

docs/source/available_structure_learning_algorithms.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@ Algorithms
3030
structure_learning_algorithms/bnlearn_sihitonpc
3131
structure_learning_algorithms/bnlearn_tabu
3232
structure_learning_algorithms/causaldag_gsp
33-
structure_learning_algorithms/causallearn_ges
3433
structure_learning_algorithms/causallearn_grasp
3534
structure_learning_algorithms/corr_thresh
3635
structure_learning_algorithms/dualpc
@@ -53,6 +52,7 @@ Algorithms
5352
structure_learning_algorithms/huge_glasso
5453
structure_learning_algorithms/huge_mb
5554
structure_learning_algorithms/huge_tiger
55+
structure_learning_algorithms/mvpc
5656
structure_learning_algorithms/paralleldg
5757
structure_learning_algorithms/pcalg_gies
5858
structure_learning_algorithms/pcalg_pc
@@ -259,6 +259,10 @@ To add new modules, see :ref:`new_modules`.
259259
- `UG <https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)#Graph>`__
260260
- `huge <https://cran.r-project.org/web/packages/huge/index.html>`__
261261
- :ref:`huge_tiger`
262+
* - MVPC
263+
- `CPDAG <https://search.r-project.org/CRAN/refmans/pcalg/html/dag2cpdag.html>`__
264+
- `MVPC <https://github.com/felixleopoldo/MVPC>`__
265+
- :ref:`mvpc`
262266
* - Parallel DG
263267
- `DG <https://en.wikipedia.org/wiki/Chordal_graph>`__
264268
- `parallelDG <https://github.com/melmasri/parallelDG>`__

docs/source/data_formats.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,16 @@ If in the continuous example above there would be two additional observations wh
124124
1.2,1.2,2.2,4.2,1,0
125125
1.1,1.5,1.4,2.2,1,1
126126
127+
Missing data
128+
*************
129+
130+
Missing data is indicated by the absence of a value. Below is an example of a dataset were the second row for bolumn b is missing.
131+
132+
.. rubric:: Example (missing data)
133+
134+
a,b,c,d
135+
0.2,2.3,5.3,0.5
136+
3.2,,2.5,1.2
127137

128138

129139
Parameters

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ generated datasets, the workflow also includes a number of standard datasets and
9696
9797
.. rubric:: News
9898

99+
* 2024-11-30: Benchpress 2.10.0. This version includes algorithms from the MVPC package for sampling (:ref:`mvpc_gen_data`) and causal discovery (:ref:`mvpc`) in the presence of missing data.
99100
* 2024-11-24: Benchpress 2.9.0. This version comes with three new major features.
100101

101102
I) The ability to incorporate background knowledge in terms of :ref:`edge_constraints`. Thanks to `Gomathi Lakshmanan <https://www.linkedin.com/in/gomathi-l/>`_ for this great feature.

docs/source/structure_learning_algorithms/huge_glasso.rst

Lines changed: 54 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,60 @@ huge_glasso
3131

3232
.. rubric:: Description
3333

34-
Abstract: We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm—the graphical lasso—that is remarkably fast: It solves a 1000-node problem (∼500000 parameters) in at most a minute and is 30–4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen and Bühlmann (2006). We illustrate the method on some cell-signaling data from proteomics.
34+
Abstract:
35+
We consider the problem of estimating the marginal independence structure of a Bayesian network from observational data in the form of an undirected graph called the unconditional dependence graph. We show that unconditional dependence graphs of Bayesian networks correspond to the graphs having equal independence and intersection numbers. Using this observation, a Gröbner basis for a toric ideal associated to unconditional dependence graphs of Bayesian networks is given and then extended by additional binomial relations to connect the space of all such graphs. An MCMC method, called GrUES (Gröbner-based Unconditional Equivalence Search), is implemented based on the resulting moves and applied to synthetic Gaussian data. GrUES recovers the true marginal independence structure via a penalized maximum likelihood or MAP estimate at a higher rate than simple independence tests while also yielding an estimate of the posterior, for which the 20% HPD credible sets include the true structure at a high rate for data-generating graphs with density at least 0.5.
36+
37+
.. rubric:: Example
38+
39+
Config file: `grues_vs_corr-thresh.json <https://github.com/felixleopoldo/benchpress/blob/master/workflow/rules/structure_learning_algorithms/grues/grues_vs_corr-thresh.json>`_
40+
41+
Command:
42+
43+
.. code:: bash
44+
45+
snakemake --cores all --use-singularity --configfile workflow/rules/structure_learning_algorithms/grues/grues_vs_corr-thresh.json
46+
47+
:numref:`roc_grues_vs_thresh` shows the ROC and :numref:`shd_grues_vs_thresh` shows the SHD comparing GrUES to correlation thresholding for datsets from five different graphs corresponding to a 5-variable random Gaussian SEM whose nodes have average degree of 1 and whose edge weights were allowed to be close to 0. Each dataset contains 300 observations and each Markov chain has 10000 observations. Note that SHD between a learned UDG and true CPDAG is not the most reasonable comparison because an inflated FPR will be reported---see :footcite:t:`grues2023` for discussion and a more reasonable benchmark.
48+
49+
:numref:`adj_grues` shows that GrUES estimates the correct `UDG <https://arxiv.org/pdf/2210.00822.pdf#subsection.2.2>`__ while correlation thresholding (:numref:`adj_thresh`) misses the edge `1---2`.
50+
51+
52+
.. _roc_grues_vs_thresh:
53+
54+
.. figure:: ../../../workflow/rules/structure_learning_algorithms/grues/images/roc.png
55+
:width: 320
56+
:alt: ROC (FPR vs. TPR) GrUES vs corr_thresh example
57+
:align: left
58+
59+
ROC of GrUES vs corr_thresh.
60+
61+
.. _shd_grues_vs_thresh:
62+
63+
.. figure:: ../../../workflow/rules/structure_learning_algorithms/grues/images/shd.png
64+
:width: 320
65+
:alt: SHD GrUES vs corr_thresh example
66+
:align: right
67+
68+
SHD of GrUES vs corr_thresh.
69+
70+
.. _adj_grues:
71+
72+
.. figure:: ../../../workflow/rules/structure_learning_algorithms/grues/images/diffplot_30.png
73+
:width: 320
74+
:alt: adjacency matrix GrUES example
75+
:align: left
76+
77+
Adj mat learned by GrUES.
78+
79+
.. _adj_thresh:
80+
81+
.. figure:: ../../../workflow/rules/structure_learning_algorithms/grues/images/diffplot_15.png
82+
:width: 320
83+
:alt: adjacency matrix corr_thresh example
84+
:align: right
85+
86+
Adj mat learned by corr_thresh.
87+
3588

3689
.. rubric:: Some fields described
3790
* ``lambda`` A positive number to control the regularization. Typical usage is to leave the input lambda: null and have the program compute its own.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
@article{mohan2013graphical,
2+
title={Graphical models for inference with missing data},
3+
author={Mohan, Karthika and Pearl, Judea and Tian, Jin},
4+
journal={Advances in neural information processing systems},
5+
volume={26},
6+
year={2013},
7+
url={https://proceedings.neurips.cc/paper_files/paper/2013/file/0ff8033cf9437c213ee13937b1c4c455-Paper.pdf}
8+
}
9+
10+
@article{rubin1976inference,
11+
title={Inference and missing data},
12+
author={Rubin, Donald B},
13+
journal={Biometrika},
14+
volume={63},
15+
number={3},
16+
pages={581--592},
17+
year={1976},
18+
publisher={Oxford University Press},
19+
url={https://academic.oup.com/biomet/article-abstract/63/3/581/270932?redirectedFrom=fulltext}
20+
}
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Module for synthetic Gaussian data generation with different types of missingness: missing at random (MAR),
2+
missing completely at random (MCAR), and missing not at random (MNAR) :footcite:t:`mohan2013graphical`, :footcite:t:`rubin1976inference`.

0 commit comments

Comments
 (0)