You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Practical companion to "Verifying the existence of maximum likelihood estimates for generalized linear models" (Correia, Guimarães, Zylkin)
2
+
3
+
This companion consists of three documents, plus a suit of test datasets, that complement the paper:
4
+
5
+
> Sergio Correia, Paulo Guimarães, Thomas Zylkin: "Verifying the existence of maximum likelihood estimates for generalized linear models"
6
+
7
+
The documents are:
8
+
9
+
1.[*Primer on nonexistence of estimates and statistical separation for Poisson models*](https://github.com/sergiocorreia/ppmlhdfe/blob/master/guides/nonexistence_primer.md): introductory guide to understanding the non-existence problem, with a focus on on Poisson models. Also discusses how to detect this issue, and explains solutions including our "iterative rectifier" method.
10
+
2.[*Examples of nonexistence of estimates for Poisson, Logit, and Multinomial Logit models*](https://github.com/sergiocorreia/ppmlhdfe/blob/master/guides/nonexistence_examples.md): discusses several canonical examples of non-existence and how our "iterative rectifier" addresses them. Examples include Logit, Multinomial Logit, and Poisson. Further presents seventeen new Poisson examples that can be used to test software implementation of existing separation algorithms as well as to benchmark the performance of future algorithms.
11
+
3.[*Nonexistence of estimates of Poisson models across different statistical packages*](https://github.com/sergiocorreia/ppmlhdfe/blob/master/guides/nonexistence_benchmarks.md): documents how non-existence affects some of the most popular statistical packages (Stata, R, Julia, Matlab), with either non-convergence or convergence to incorrect solutions.
12
+
13
+
Also see:
14
+
15
+
-[*Main page for the `ppmlhdfe` Stata package](https://github.com/sergiocorreia/ppmlhdfe), including some [undocumented options](https://github.com/sergiocorreia/ppmlhdfe/blob/master/guides/undocumented.md) that can be used to illustrate and diagnose non-existence issues.
16
+
-[*Suite of 17 poisson examples exhibiting non-existence*](https://github.com/sergiocorreia/ppmlhdfe/tree/master/guides/separation_datasets)
Copy file name to clipboardExpand all lines: guides/nonexistence_benchmarks.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@
6
6
7
7
To the best of our knowledge, no existing statistical software addresses the separation problem in a robust way, more so when working with fixed effects. In this post, we study a few simple examples of separation, and how they affect some of the most popular statistical packages.
8
8
9
-
We also include [18 example datasets](https://github.com/sergiocorreia/ppmlhdfe/tree/master/test/separation_datasets) that can be used by package developers to test for correctness of their programs, and we invite further additions to this list.
9
+
We also include [17 example datasets](https://github.com/sergiocorreia/ppmlhdfe/tree/master/guides/separation_datasets) that can be used by package developers to test for correctness of their programs, and we invite further additions to this list.
10
10
11
11
Note that this is in no way a critique of the packages discussed below, which are in our opinion of excellent quality.
12
12
Merely, we are using them to show the fact that separation is not only a [theoretical](http://scorreia.com/research/separation.pdf) problem, but a practical one.
*(These examples complement [Verifying the existence of maximum likelihood estimates for generalized linear models](https://arxiv.org/abs/1903.01633); please see the links above for related guides.)*
8
8
@@ -355,6 +355,136 @@ As we can see, `ppmlhdfe` drops two observations as well as the variable `x2`. A
355
355
356
356
If you are a Stata user, you can run the script [`6-cgz-poisson-benchmarks.do`](code/6-cgz-poisson-benchmarks.do) in order to run all seventeen tests. Alternatively, it should be feasible to construct an equivalent for-loop in any statistical programming language.
357
357
358
+
## Tobit (Type I Tobit model)
359
+
360
+
Santos Silva and Tenreyro (2011) discuss a Tobit model left-censored at zero that suffers from nonexistence. In particular the model's likelihood is maximized when `b_{z} -> +∞`; i.e. as the coefficient for `z` approaches infinity.
In this specific example, the `ppml` comamnd can detect this issue:
414
+
415
+
```stata
416
+
. ppml y x z, check
417
+
418
+
note: checking the existence of the estimates
419
+
420
+
Number of regressors excluded to ensure that the estimates exist: 1
421
+
Excluded regressors: z
422
+
Number of observations excluded: 2
423
+
```
424
+
425
+
Although as discussed in our [discussion of software packages](https://github.com/sergiocorreia/ppmlhdfe/blob/master/guides/nonexistence_benchmarks.md), this is only able to detect some specific instances of separation.
426
+
427
+
Instead, a more general alternative could be to repurpose `ppmlhdfe` to detect and exclude separation (notice how the tobit regression omits `z` due to collinearity, and also the two separated observations):
428
+
429
+
```stata
430
+
. ppmlhdfe y x z
431
+
(simplex method dropped 2 separated observations)
432
+
note: 1 variable omitted because of collinearity: z
Finally, notice that we could even use ppmlhdfe's diagnostic tool to identify the specific directions-of-recession, and what exact linear combination of regressors is driving the separation:
459
+
460
+
```stata
461
+
. ppmlhdfe y x z, tagsep(sep) zvar(z) r2
462
+
(identifying separated observations instead of running regressions)
463
+
<some output omitted...>
464
+
(ReLU method dropped 2 separated observations in 1 iterations)
As shown in the last regression, `ppmlhdfe` identifies that only one coefficient does not exist, as it has a non-zero coefficient.
486
+
487
+
358
488
## References
359
489
360
490
- Palmgren (1981). "Models for the analysis of contingency tables with quantitative outcome variables". Biometrika, 68(3):563–576. https://www.jstor.org/stable/2335606
@@ -364,7 +494,7 @@ If you are a Stata user, you can run the script [`6-cgz-poisson-benchmarks.do`](
364
494
- Correia, Guimarães, and Zylkin (2019). "Verifying the existence of maximum likelihood estimates for generalized linear models". arXiv Working Paper: https://arxiv.org/abs/1903.01633
365
495
- Kosmidis and Schumacher (2021). "`detectseparation`: Detect and Check for Separation and Infinite Maximum Likelihood Estimates". https://cran.r-project.org/web/packages/detectseparation/
366
496
- Kosmidis (2017). "`brglm2`: Bias Reduction in Multinomial Models". https://cran.r-project.org/web/packages/brglm2/vignettes/multinomial.html
367
-
- Geyer (2009). "Likelihood Inference in Exponential Families and Directions of Recession." University of Minnesota, School of Statistics. http://www.stat.umn.edu/geyer/5421/notes/infinity.pdf
497
+
- Geyer (2009). "Likelihood Inference in Exponential Families and Directions of Recession." University of Minnesota, School of Statistics. https://arxiv.org/pdf/0901.0455
0 commit comments