DOC: Add statistical references to EABM (#176)

star1327p · web-flow · commit 4a5100de9272 · 2025-10-17T10:04:47.000+03:00
* DOC: Add reference to "The Book of Statistical Proofs"

* DOC: Add citation to "The Book of Statistical Proofs"

* DOC: Add another reference to The Book of Statistical Proofs

* DOC: Add links to preliz functions

* DOC: Add another link to maxent in preliz

Updated the reference to the 'maxent' function to include a link for better clarity.
diff --git a/Chapters/Model_comparison.qmd b/Chapters/Model_comparison.qmd
@@ -98,7 +98,7 @@ for p, ax in zip([0.5, 0.1, 0.9, 0.0001], axes.ravel()):
     ax.set_ylim(-0.05, 1.05)
 ```
 
-The concept of entropy appears many times in statistics. It can be useful, for example when defining priors. In general we want to use a prior that has maximum entropy given our knowledge (see for example [PreliZ](https://preliz.readthedocs.io/en/latest/)'s `maxent` function). And also when comparing models as we will see in the next section.
+The concept of entropy appears many times in statistics. It can be useful, for example when defining priors. In general we want to use a prior that has maximum entropy given our knowledge (see for example [PreliZ](https://preliz.readthedocs.io/en/latest/)'s [`maxent`](https://preliz.readthedocs.io/en/latest/unidimensional.html#preliz.unidimensional.maxent) function). And also when comparing models as we will see in the next section.
 
 ## KL divergence {#sec-kl-divergence}
 
@@ -118,7 +118,7 @@ $$
 $$
 
 
-If $p$ represents the **data generating process** or the **population** or the **true** distribution, and $q$ represents our model. It may seems that this expressions are all useless because we don't know $p$. That the reason we are trying to fit a model in the first place. But, if our goal is to compare $m$ models represented with $q_0, q_1 \cdots q_m$, we can can still use the KL divergence to compare them! The reason is that even when we do not know $p$, its entropy is a constant term for all comparisons.
+If $p$ represents the **data generating process** or the **population** or the **true** distribution, and $q$ represents our model. It may seems that this expressions are all useless because we don't know $p$. That the reason we are trying to fit a model in the first place. But, if our goal is to compare $m$ models represented with $q_0, q_1, \cdots, q_m$, we can can still use the KL divergence to compare them! The reason is that even when we do not know $p$, its entropy is a constant term for all comparisons.
 
 $$
 \begin{split}
@@ -493,7 +493,7 @@ $$
 BF_{01} = \frac{p(y \mid H_0)}{p(y \mid H_1)} \frac{p(\theta=0.5 \mid y, H_1)}{p(\theta=0.5 \mid H_1)}
 $$
 
-This is true only when $H_0$ is a particular case of $H_1$, [see](https://statproofbook.github.io/P/bf-sddr). 
+This is true only when $H_0$ is a particular case of $H_1$, see [The Book of Statistical Proofs](https://statproofbook.github.io/P/bf-sddr) [@soch_2024]. 
 
 Let's do it. We only need to sample the prior and posterior for a model. Let's try the BetaBinomial model with a Uniform prior:
 
diff --git a/Chapters/Prior_elicitation.qmd b/Chapters/Prior_elicitation.qmd
@@ -107,7 +107,7 @@ For some priors in a model, we may know or assume that most of the mass is withi
 
 ## Maximum entropy distributions with maxent
 
-In PreliZ we can compute maximum entropy priors using the function `maxent`. It works for unidimensional distributions. The first argument is a PreliZ distribution. Then we specify an upper and lower bound and the probability between them.
+In PreliZ we can compute maximum entropy priors using the function [`maxent`](https://preliz.readthedocs.io/en/latest/unidimensional.html#preliz.unidimensional.maxent). It works for unidimensional distributions. The first argument is a PreliZ distribution. Then we specify an upper and lower bound and the probability between them.
 
 As an example, we want to elicit a scale parameter. From domain knowledge we know the parameter has a relatively high probability of being less than 3. Hence, we could use a HalfNormal distribution and do:
 
@@ -140,10 +140,10 @@ dist_mean.summary(), dist_mode.summary()
 
 ## Other direct elicitation methods from PreliZ
 
-There are many other method for direct elicitation of parameters. For instance the [quartile](https://preliz.readthedocs.io/en/latest/unidimensional.html#preliz.unidimensional.quartile) functions identifies a distribution that matches specified
-quartiles, and [Quartine_int](https://preliz.readthedocs.io/en/latest/unidimensional.html#preliz.unidimensional.QuartileInt) provides an interactive approach to achieve the same, offering a more hands-on experience for refining distributions.
+There are many other method for direct elicitation of parameters. For instance the [`quartile`](https://preliz.readthedocs.io/en/latest/unidimensional.html#preliz.unidimensional.quartile) function identifies a distribution that matches specified
+quartiles, and [`Quartine_int`](https://preliz.readthedocs.io/en/latest/unidimensional.html#preliz.unidimensional.QuartileInt) provides an interactive approach to achieve the same, offering a more hands-on experience for refining distributions.
 
-One method worth of special mention is the [Roulette](https://preliz.readthedocs.io/en/latest/unidimensional.html#preliz.unidimensional.Roulette) method allows which allows users to find a prior distribution by drawing it interactively [@morris_2014]. The name "roulette" comes from the analogy of placing a limited set of chips where one believes the mass of a distribution should be concentrated. In this method, a grid of `m` equally sized bins is provided, covering the range of `x`, and users allocate a total of `n` chips across the bins. Effectively, this creates a histogram,representing the user's information about the distribution. The method then identifies the best-fitting distribution from a predefined pool of options, translating the drawn histogram into a suitable probabilistic model. 
+One method worth of special mention is the [`Roulette`](https://preliz.readthedocs.io/en/latest/unidimensional.html#preliz.unidimensional.Roulette) method allows which allows users to find a prior distribution by drawing it interactively [@morris_2014]. The name "roulette" comes from the analogy of placing a limited set of chips where one believes the mass of a distribution should be concentrated. In this method, a grid of `m` equally sized bins is provided, covering the range of `x`, and users allocate a total of `n` chips across the bins. Effectively, this creates a histogram,representing the user's information about the distribution. The method then identifies the best-fitting distribution from a predefined pool of options, translating the drawn histogram into a suitable probabilistic model. 
 
 As this is an interactive method we can't show it here, but you can run the following cell to see how it works.
 
@@ -253,7 +253,7 @@ The new priors still generate some values that are too wide, but at least the bu
 
 The process described in the previous section is straightforward: sample from the prior predictive --> plot --> refine --> repeat. On the good side, this is a very flexible approach and can be a good way to understand the effect of individual parameters in the predictions of a model. But it can be time-consuming and it requires some understanding of the model so you know which parameters to tweak and in which direction.
 
-One way to improve this workflow is by adding interactivity. We can do this with PreliZ's function, `predictive_explorer`. Which we can not show here, in a full glory but you can see an static image in @fig-predictive-explorer, and you can try it for yourself by running the following block of code.
+One way to improve this workflow is by adding interactivity. We can do this with PreliZ's function, [`predictive_explorer`](https://preliz.readthedocs.io/en/latest/predictive.html#preliz.predictive.predictive_explorer). Which we can not show here, in a full glory but you can see a static image in @fig-predictive-explorer, and you can try it for yourself by running the following block of code.
 
 ```{python}
 #| eval : false
diff --git a/Chapters/Prior_posterior_predictive_checks.qmd b/Chapters/Prior_posterior_predictive_checks.qmd
@@ -454,7 +454,7 @@ $$
 U = F_Y(Y)
 $$
 
-follows a standard Uniform distribution. A proof of this result can be found in the [The Book of Statistical Proofs](https://statproofbook.github.io/P/cdf-pit.html). 
+follows a standard Uniform distribution. A proof of this result can be found in [The Book of Statistical Proofs](https://statproofbook.github.io/P/cdf-pit.html) [@soch_2024]. 
 
 In other words if we apply the CDF of any continuous distribution to a random variable with that distribution, the result will be a random variable with a standard uniform distribution. This is a very powerful result, as it allows us to use the standard uniform distribution as a reference distribution for many statistical tests, including posterior predictive checks.
 
diff --git a/references.bib b/references.bib
@@ -861,3 +861,11 @@ @inproceedings{fernandes_2018
 	year = {2018},
 	pages = {1--12},
 }
+
+@misc{soch_2024,
+	title={The Book of Statistical Proofs},
+	author={Soch, Joram and Faulkenberry, Thomas J and Petrykowski, Kenneth and Allefeld, Carsten},
+	year={2024},
+	doi={10.5281/ZENODO.4305949},
+	url={https://statproofbook.github.io/}
+}