-
Notifications
You must be signed in to change notification settings - Fork 117
Adjust ftest #628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Adjust ftest #628
Conversation
fb75f1a to
deb4d2f
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #628 +/- ##
==========================================
- Coverage 96.98% 96.98% -0.01%
==========================================
Files 8 8
Lines 1196 1193 -3
==========================================
- Hits 1160 1157 -3
Misses 36 36 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f9e1ccd to
f759de4
Compare
This commit changes the degrees of freedom shown in the table from the number of estimated parameters (dof) to be the number of degrees of freedom for the model (residual_dof). This makes it possible to easily calculate the F-test statistic from the other quantities in the table with the usual formulas. Since nobs now returns floats, residual_dof also generally returns floats, so the degrees of freedom parameters are now stored as floats instead of integers. The commit removes the R-squared quantities. They are not needed and I don't think they add value. The commit also removes numbering of the rows in the output. I don't think it is helpful. The show method is now for MIME"text/plain" since that is more appropriate for a "decorated" show method that span multiple lines. In most cases this won't be visible to users.
f759de4 to
19fda81
Compare
|
Funny, I had made the opposite choice regarding DOF at #337 (comment). I'm not a fan of residual DOF because the number can be very high with large samples, but OK if you prefer that. Though I wouldn't call the column with residual degrees of freedom "dof", as it's the name of the function that returns residual DOF. Confusion is already quite common. Maybe "Res. DOF"? Using upper case would seem more consistent with SSR. Also I'd rather keep the R², it's interesting to compare models and its magnitude more interpretable than SSR. It's not needed for the F-test, but we probably don't want to add another function just to compare models with their R² (Stata has |
|
Ah, and something to take into account is that we should use the same output for |
The idea is that the numbers in the table are the ones that are used in the test statistic. The residual degrees of freedom are the relevant quantities for that.
I don't think
Sum of squares are almost always upper case, but degrees of freedom is a mixed bag. SAS writes "DF" (but uses upper case a lot), Stata writes "df", R writes "Df", and two books in English on my shelf that cover ANOVA write "df" and "d.f." respectively (and SSR/SSE). Hence, I'd actually be in favor of just "df".
I think it is odd to include in an ANOVA table. The name of the function here is
Hadn't considered that so I'll take a look. |
|
OK. But I'd still specify "Residual" or "Res.". R and Stata do this, it doesn't cost much and it's more explicit. The Δ column can stay that way as it's the same for both kinds of degrees of freedom. (I really don't understand why some software/papers use lower case "dof" given that acronyms are almost always in upper case, but that's less important...) While we're bikeshedding appearance: are you really opposed to numbering lines? Other implementations do that, and thinking about it I think it can be useful when you have e.g. 5 models or more (I do that sometimes). Of course you can always count but it seems nice to make this easier, and it doesn't add much noise. |
This commit changes the degrees of freedom shown in the table from the number of estimated parameters (dof) to be the number of degrees of freedom for the model (residual_dof). This makes it possible to easily calculate the F-test statistic from the other quantities in the table with the usual formulas.
Since nobs now returns floats, residual_dof also generally returns floats, so the degrees of freedom parameters are now stored as floats instead of integers.
The commit removes the R-squared quantities. They are not needed and I don't think they add value.
The commit also removes numbering of the rows in the output. I don't think it is helpful.
The example from the docstring changes from
──────────────────────────────────────────────────────────────── DOF ΔDOF SSR ΔSSR R² ΔR² F* p(>F) ──────────────────────────────────────────────────────────────── [1] 2 3.2292 0.0000 [2] 3 1 0.1283 3.1008 0.9603 0.9603 241.6234 <1e-07 ────────────────────────────────────────────────────────────────to
The test statistic can be calculated as