You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The mean is the average value of the distribution, the median is the value that divides the distribution into two equal halves, and the mode is the value with the highest probability density. In symmetric distributions like the `Normal`, these three measures are equal. But in skewed distributions like the `Gamma`, they can be quite different. The mean is probably the most commonly used measure of central tendency, but it can be sensitive to outliers. The median is more robust to outliers and is often preferred in skewed distributions. The mode is useful when we are interested in the most probable value of the distribution, the mode is usually less common than the other too. In practice, the choice of which measure to use depends on the context and the specific characteristics of the distribution being analyzed.
161
+
The mean is the average value of the distribution, the median is the value that divides the distribution into two equal halves, and the mode is the value with the highest probability density. In symmetric distributions like the `Normal`, these three measures are equal. But in skewed distributions like the `Gamma`, they can be quite different.
162
162
163
-
In ArviZ we can compute these point estimates from samples using the `azp.mean()`, `azp.median()`, and `azp.mode()` functions.
163
+
The mean is probably the most commonly used measure of central tendency, but it can be sensitive to outliers. The median is more robust to outliers and is often preferred in skewed distributions. The mode is useful when we are interested in the most probable value of the distribution. The mode is usually less common than the other too. In practice, the choice of which measure to use depends on the context and the specific characteristics of the distribution being analyzed.
164
+
165
+
In ArviZ, we can compute these point estimates from samples using the `azp.mean()`, `azp.median()`, and `azp.mode()` functions.
Point estimates also show in other places in ArviZ, for example in the `plot_dist()` or when calling `summary()`, these functions has a `point_estimate` argument to choose which point estimate to use. The default value is controlled globally by:
172
+
Point estimates also show in other places in ArviZ, for example, in the `plot_dist()` or when calling `summary()`. Some functions have a `point_estimate` argument to choose which point estimate to use. The default value is controlled globally by:
To describe the uncertainty in our estimates, we usually want to complement point estimates with some measure of dispersion. A common approach is to use the standard deviation or the variance. The standard deviation is usually preferred over the variance because it is in the same units as the original data. However, both measures can be misleading for skewed distributions or distributions with heavy tails. Hence, other measures of dispersion are often preferred like the median absolute deviation [MAD](https://en.wikipedia.org/wiki/Median_absolute_deviation) or the interquartile range [IQR](https://en.wikipedia.org/wiki/Interquartile_range).
179
181
180
-
One issue with using a single number to summarize uncertainty is that it does not provide information about the shape of the distribution. For example two distributions can have the same standard deviation but very different shapes. Or for bounded distributions like the `Gamma`, the standard deviation can be misleading as it does not take into account that negative values are not allowed.
182
+
One issue with using a single number to summarize uncertainty is that it does not provide information about the shape of the distribution. For example, two distributions can have the same standard deviation but very different shapes. Or for bounded distributions like the `Gamma`, the standard deviation can be misleading, as it does not take into account that negative values are not allowed.
181
183
182
-
One popular way to summarize the uncertainty in a distribution is to use intervals. For example, we may want to report that 90% of the values lie within a certain range. In Bayesian statistics, these often called `credible intervals`. In principle we can defined infinite intervals containing a given mass. Then we need to add some other constraint to build useful intervals. Two common types of credible intervals are the `equal-tailed interval` (ETI) and the `highest-density interval` (HDI).
184
+
One popular way to summarize the uncertainty in a distribution is to use intervals. For example, we may want to report that 90% of the values lie within a certain range. In Bayesian statistics, these are often called `credible intervals`. In principle, we can defined infinite intervals containing a given mass. Then we need to add some other constraint to build useful intervals. Two common types of credible intervals are:
183
185
184
-
* ETI: The interval that contains a given percentage of the distribution, with equal probability in both tails. For example, a 90% equal-tailed interval
186
+
*The `equal-tailed interval` (ETI): The interval that contains a given percentage of the distribution, with equal probability in both tails. For example, a 90% equal-tailed interval
185
187
has 90% of the distribution between the lower and upper bounds, with 5% of the distribution in each tail.
186
188
187
-
* HDI: The interval that contains a given mass and where all points inside the interval have a higher density than any point outside the interval. Alternatively, we can think of it as the shortest interval containing a given portion of the probability density.
189
+
*The `highest-density interval` (HDI): The interval that contains a given mass and where all points inside the interval have a higher density than any point outside the interval. Alternatively, we can think of it as the shortest interval containing a given portion of the probability density.
188
190
189
191
For some distributions like asymmetric ones, the HDI is usually preferred over the ETI because it better represents the most credible values of the distribution.
0 commit comments