GH-16583: Access GLM Variance-Covariance Matrix with vcov#16586
GH-16583: Access GLM Variance-Covariance Matrix with vcov#16586manh4wk wants to merge 6 commits intoh2oai:masterfrom
vcov#16586Conversation
|
@tomasfryda Do you know if there's anything else I can do right now to see if this passes all the tests, etc.? I saw you mentioned in another thread the team is pretty busy at the moment. |
|
Hi @maurever or @valenad1, Can one of you take a look at this? Having the GLM's variance-covariance matrix available will let us do things like run a Wald test on two different levels of a categorical variable to see if they should be treated as statistically different, or if they should be combined into a single category. |
tomasfryda
left a comment
There was a problem hiding this comment.
I think it's a good idea to expose variance-covariance matrix but that probably depends on @valenad1's decision.
If he agrees, I would suggest fixing R tests and making sure column names and row names are the same - currently column names are always lowercased (IIRC this can be caused by the TwoDimTableV3 so I would consider choosing different data structure (e.g. H2OFrame).), row names aren't.
For example the Intercept vs intercept:

Note that I didn't do complete review, I just looked at the R part of the PR.
| manualYear <- mFV@model$coefficients_table$year | ||
|
|
||
| # compare values from model and obtained manually | ||
| for (ind in c(1:length(manuelPValues))) |
There was a problem hiding this comment.
manuelPValues doesn't seem to be defined anywhere. Also, I would recommend to use seq_along(x) instead of 1:length(x) (when the x is empty, the latter will produce c(1, 0)).
| doTest("GLM: make sure error is generated when a gbm model calls glm functions", testGBMvcov) | ||
| doTest("GLM: make sure error is generated when compute_p_values=FALSE", testGLMvcovcomputePValueFALSE) | ||
| doTest("GLM: test variance-covariance values", testGLMPValZValStdError) |
There was a problem hiding this comment.
I would prefer something like:
doSuite("GLM: VCOV support", makeSuite(testGBMvcov, testGLMvcovcomputePValueFALSE, testGLMPValZValStdError))There was a problem hiding this comment.
There is no test that would test if the implementation is working. It just tests if it throws an error if used when unsupported.
Made the variance-covariance matrix for GLMs part of the model_output results so they're accessible by Python and R. The matrix is rearranged in
h2o-algos/src/main/java/hex/schemas/GLMModelV3.javaso that the Intercept is both the first row and the first column, similar to how it's done for the GLM coefficient results in the same area of the code.This matrix is now accessible with the
glm_model_object.vcov()function in Python and withh2o.vcov(glm_model_object)in R.This change fixes #16583