Skip to content

Commit a26247c

Browse files
committed
Update paper
1 parent 5d5d556 commit a26247c

File tree

4 files changed

+68
-28
lines changed

4 files changed

+68
-28
lines changed

.prettierignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
paper/paper.md

paper/README.md

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,6 @@ Target submission date: June 20th, 2025.
1010

1111
## TODO
1212

13-
- mention HCM data structure from `[@weinstein2024hierarchicalcausalmodels]`,
14-
_only after_ https://github.com/y0-causal-inference/y0/pull/236 is finished
15-
and merged
16-
- @Jeremy reference other PNNL use cases (even if they're not published)
17-
- @Jeremy there's a note for you to fill in a sentence in the future work
18-
paragraph
1913
- Get Pruthvi's ORCID
2014

2115
## Linting

paper/paper.bib

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -431,3 +431,28 @@ @article{weinstein2024hierarchicalcausalmodels
431431
title = {Hierarchical Causal Models},
432432
}
433433

434+
@article{mohan2021,
435+
author = {Mohan, Karthika and and, Judea Pearl},
436+
publisher = {ASA Website},
437+
url = {https://doi.org/10.1080/01621459.2021.1874961},
438+
date = {2021},
439+
doi = {10.1080/01621459.2021.1874961},
440+
eprint = {https://doi.org/10.1080/01621459.2021.1874961},
441+
journaltitle = {Journal of the American Statistical Association},
442+
number = {534},
443+
pages = {1023--1037},
444+
title = {Graphical Models for Processing Missing Data},
445+
volume = {116},
446+
}
447+
448+
@article{tikka2017b,
449+
author = {Tikka, Santtu and Karvanen, Juha},
450+
url = {http://jmlr.org/papers/v18/16-166.html},
451+
date = {2017},
452+
journaltitle = {Journal of Machine Learning Research},
453+
number = {36},
454+
pages = {1--30},
455+
title = {Simplifying Probabilistic Expressions in Causal Inference},
456+
volume = {18},
457+
}
458+

paper/paper.md

Lines changed: 42 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,12 @@ authors:
4848
degree: supporting
4949
- type: supervision
5050
degree: supporting
51+
- name: Haley Hummel
52+
orcid: 0009-0004-5405-946X
53+
affiliation: 4
54+
roles:
55+
- type: software
56+
degree: supporting
5157
- name: Nathaniel Merrill
5258
orcid: 0000-0002-1998-0980
5359
affiliation: 2
@@ -67,7 +73,13 @@ authors:
6773
degree: supporting
6874
- name: Marc-Antoine Parent
6975
orcid: 0000-0003-4159-7678
70-
affiliation: 4
76+
affiliation: 5
77+
roles:
78+
- type: software
79+
degree: supporting
80+
- name: Adam Rupe
81+
affiliation: 2
82+
orcid: 0000-0003-0105-8987
7183
roles:
7284
- type: software
7385
degree: supporting
@@ -104,8 +116,11 @@ affiliations:
104116
- name: Northeastern University
105117
index: 3
106118
ror: 04t5xt781
107-
- name: Conversence
119+
- name: Oregon State University
108120
index: 4
121+
ror: 00ysfqy60
122+
- name: Conversence
123+
index: 5
109124

110125
date: 9 May 2025
111126
---
@@ -191,8 +206,9 @@ Verma constraints [@tian2012verma].
191206
algorithms of any causal inference package. It implements `ID`
192207
[@shpitser2006id], `IDC` [@shpitser2007idc], `ID*` [@shpitser2012idstar], `IDC*`
193208
[@shpitser2012idstar], surrogate outcomes (`TRSO`) [@tikka2019trso], `tian-ID`
194-
[@tian2010identifying], transport [@correa2020transport], and counterfactual
195-
transport [@correa2022cftransport].
209+
[@tian2010identifying], transport [@correa2020transport], counterfactual
210+
transport [@correa2022cftransport], and identification for causal queries over
211+
hierarchical causal models [@weinstein2024hierarchicalcausalmodels].
196212

197213
# Case Study
198214

@@ -204,9 +220,7 @@ following prior knowledge:
204220
2. Accumulation of tar in the lungs increase the risk of cancer
205221
3. Smoking itself also increases the risk of cancer
206222

207-
![**A**) A simplified acyclic directed graph model representing prior knowledge on smoking and cancer and **B
208-
**) a more complex acyclic directed mixed graph that explicitly represents confounding variables.](figures/cancer_tar.pdf){#cancer
209-
height="100pt"}
223+
![**A**) A simplified acyclic directed graph model representing prior knowledge on smoking and cancer and **B**) a more complex acyclic directed mixed graph that explicitly represents confounding variables.](figures/cancer_tar.pdf){#cancer height="100pt"}
210224

211225
The ID algorithm [@shpitser2006id] estimates the effect of smoking on the risk
212226
of cancer in \autoref{cancer}A as
@@ -240,27 +254,33 @@ We highlight several which used (and motivated further development of) $Y_0$:
240254
workflow for simple causal queries compatible with `ID`.
241255
- [@ness_causal_2024] uses $Y_0$ as a teaching tool for identification and the
242256
causal hierarchy
243-
- TODO Jeremy reference other PNNL use cases (even if they're not published)
244257

245258
# Future direction
246259

247260
There remain several high value identification algorithms to include in $Y_0$ in
248-
the future. For example, the generalized ID (`gID`) [@lee2019general] and
249-
generalized counterfactual ID (`gID*`) [@correa2021counterfactual] are important
250-
because TODO Jeremy. The cyclic ID (`ioID`)
261+
the future. For example, the cyclic ID (`ioID`)
251262
[@forré2019causalcalculuspresencecycles] is important to work with more
252263
realistic graphs that contain cycles, such as how biomolecular signaling
253-
pathways often contain feedback loops.
254-
255-
Similarly, it remains an open research question on how to estimate the causal
256-
effect for an arbitrary estimand produced by an algorithm more sophisticated
257-
than `ID`. Two potential avenues for overcoming this might be a combination of
258-
the Pyro probabilistic programming langauge [@bingham2018pyro] and its causal
259-
inference extension [ChiRho](https://github.com/BasisResearch/chirho). Tractable
260-
circuits [@darwiche2022causalinferenceusingtractable] also present a new
261-
paradigm for generic estimation. Such a generalization would be a lofty
262-
achievement and enable the automation of downstream applications in experimental
263-
design.
264+
pathways often contain feedback loops. Further, missing data identification
265+
algorithms can handle when data is missing not at random (MNAR) by modeling the
266+
underlying missingness mechanism [@mohan2021]. Many algorithms covered by the
267+
review by [@JSSv099i05], such as generalized ID (`gID`) [@lee2019general] and
268+
generalized counterfactual ID (`gID*`) [@correa2021counterfactual] can be
269+
formulated as special cases of counterfactual transportability. Therefore, we
270+
also plan to improve the user experience to using more powerful algorithms like
271+
counterfactual transport through a simplified API.
272+
273+
Similarly, we would like to implement probabilistic expression simplification
274+
described by [@tikka2017b] to make reading estimands easier.
275+
276+
It remains an open research question on how to estimate the causal effect for an
277+
arbitrary estimand produced by an algorithm more sophisticated than `ID`. Two
278+
potential avenues for overcoming this might be a combination of the Pyro
279+
probabilistic programming langauge [@bingham2018pyro] and its causal inference
280+
extension [ChiRho](https://github.com/BasisResearch/chirho). Tractable circuits
281+
[@darwiche2022causalinferenceusingtractable] also present a new paradigm for
282+
generic estimation. Such a generalization would be a lofty achievement and
283+
enable the automation of downstream applications in experimental design.
264284

265285
# Availability and usage
266286

0 commit comments

Comments
 (0)