Skip to content

GH-16676 GLM: Remove offset effects#16749

Open
maurever wants to merge 6 commits intorel-3.46.0from
maurever_GH-16676_remove_offset_effects
Open

GH-16676 GLM: Remove offset effects#16749
maurever wants to merge 6 commits intorel-3.46.0from
maurever_GH-16676_remove_offset_effects

Conversation

@maurever
Copy link
Contributor

Issue: #16676

@maurever maurever self-assigned this Jan 22, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new experimental GLM option remove_offset_effects to keep offsets during training but remove their effect during scoring/prediction and model metrics, aligning with the “restricted vs unrestricted” model pattern already used for control_variables.

Changes:

  • Introduces remove_offset_effects parameter in GLM (backend + REST schema) and exposes it in R/Python clients.
  • Updates GLM scoring/metrics/scoring-history flow to compute both restricted (offset removed) and unrestricted metrics, and enables make_unrestricted_glm_model for this use case.
  • Adds docs + new tests/examples across Java/R/Python to exercise the feature.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 20 comments.

Show a summary per file
File Description
h2o-algos/src/main/java/hex/glm/GLM.java Implements restricted/unrestricted scoring-history + metrics computation when remove_offset_effects is enabled.
h2o-algos/src/main/java/hex/glm/GLMModel.java Adds new parameter + basic validation for remove_offset_effects.
h2o-algos/src/main/java/hex/glm/GLMScore.java Skips adding offset into the linear predictor when restricted scoring is enabled.
h2o-algos/src/main/java/hex/glm/GLMUtils.java Renames/extends scoring history combiner for “restricted” use.
h2o-algos/src/main/java/hex/schemas/GLMV3.java Exposes remove_offset_effects via REST schema.
h2o-algos/src/main/java/hex/api/MakeGLMModelHandler.java Allows creating unrestricted model when remove_offset_effects was used; resets the flag on the derived model.
h2o-algos/src/test/java/hex/glm/GLMControlVariablesTest.java Adds backend tests for remove_offset_effects behavior and its interaction with control_variables.
h2o-r/h2o-package/R/glm.R Adds R API parameter + expands make_unrestricted_glm_model guard.
h2o-bindings/bin/custom/R/gen_glm.py Updates generated R binding template for make_unrestricted_glm_model guard.
h2o-r/tests/testdir_algos/glm/runit_GLM_remove_offset_effects_explain.R Adds an R explain/learning-curve smoke test with remove_offset_effects.
h2o-py/h2o/estimators/glm.py Adds Python API parameter + getter/setter.
h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_effects.py Adds Python test comparing behavior with/without remove_offset_effects.
h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_glm.py Adds Python test scaffold around offset scoring behavior.
h2o-docs/src/product/data-science/algo-params/remove_offset_effects.rst Documents the new parameter and provides examples.
h2o-docs/src/product/data-science/algo-params/control_variables.rst Links control_variables docs to remove_offset_effects docs.
h2o-core/src/main/java/hex/ModelMetricsBinomial.java Minor signature cleanup (parameter rename).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 6 to 7
glm_control_variables_explain <- function() {
df <- h2o.importFile("https://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv")
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test helper function name is still glm_control_variables_explain, which is misleading for a remove_offset_effects test and makes failures harder to triage. Please rename it (and the doTest description) to reflect remove_offset_effects.

Copilot uses AI. Check for mistakes.
h2o.explain(unrestricted_prostate_glm, df)
}

doTest("GLM: Control variables works with expain", glm_control_variables_explain)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix typo in test description string: "expain" → "explain".

Suggested change
doTest("GLM: Control variables works with expain", glm_control_variables_explain)
doTest("GLM: Control variables works with explain", glm_control_variables_explain)

Copilot uses AI. Check for mistakes.
if(preds.vec(2).at(i) != preds2.vec(2).at(i)) differ++;
}
System.out.println(differ + " " + threshold);
assert differ > threshold;
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses the Java assert keyword, which is typically disabled unless tests are run with -ea, so the check may not execute in CI. Prefer JUnit assertions (eg Assert.assertTrue(...)) to ensure the test always enforces the condition.

Suggested change
assert differ > threshold;
assertTrue("Expected number of differing predictions to exceed threshold", differ > threshold);

Copilot uses AI. Check for mistakes.
Comment on lines 4083 to 4085
} if (_model._parms._remove_offset_effects) {
_scoringHistoryUnrestrictedModel.addIterationScore(_state._iter, _state.likelihood(), _state.objective());
_scoringHistory.addIterationScore(_state._iter, _state.likelihood(), _state.objective());
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updateProgress(): this if (_remove_offset_effects) ... else ... block is not an else if to the preceding control_variables branch, so when control_variables is enabled the code will still fall into the else here and add an extra _scoringHistory.addIterationScore(...) each iteration. This can corrupt scoring history / early stopping bookkeeping; make these branches mutually exclusive.

Suggested change
} if (_model._parms._remove_offset_effects) {
_scoringHistoryUnrestrictedModel.addIterationScore(_state._iter, _state.likelihood(), _state.objective());
_scoringHistory.addIterationScore(_state._iter, _state.likelihood(), _state.objective());
} else if (_model._parms._remove_offset_effects) {
_scoringHistoryUnrestrictedModel.addIterationScore(_state._iter, _state.likelihood(), _state.objective());
_scoringHistory.addIterationScore(_state._iter, _state.likelihood(), _state.objective());

Copilot uses AI. Check for mistakes.

Model metrics and scoring history are calculated for both the restricted model (with offset effects removed) and the unrestricted model (with offset effect included).

To get the unrestricted model with its own metrics use ``glm.make_unrestriced_glm_model()``/``h2o.make_unrestricted_glm_model(glm)``.
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs: the reference to glm.make_unrestriced_glm_model() appears to be a typo/non-existent function (also "unrestriced" misspelled). This should point to the actual API (h2o.make_unrestricted_glm_model in R / make_unrestricted_glm_model in Python) to avoid broken guidance.

Suggested change
To get the unrestricted model with its own metrics use ``glm.make_unrestriced_glm_model()``/``h2o.make_unrestricted_glm_model(glm)``.
To get the unrestricted model with its own metrics use ``h2o.make_unrestricted_glm_model(glm)`` (R) or ``make_unrestricted_glm_model(glm)`` (Python).

Copilot uses AI. Check for mistakes.
mtrain._nobs, _model._output._validation_metrics._nobs, _state.lambda(), _state.alpha());
} else { // only doing training deviance
_scoringHistory.addIterationScore(true, false, _state._iter, _state.likelihood(),
_state.objective(), _state.deviance(), Double.NaN, mtrain._nobs, 1, _state.lambda(),
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable mtrain may be null at this access as suggested by this null guard.
Variable mtrain may be null at this access as suggested by this null guard.
Variable mtrain may be null at this access as suggested by this null guard.

Copilot uses AI. Check for mistakes.
glm_model.train(x=["name", "power", "year"], y="economy_20mpg", training_frame=cars)

predictions_train = glm_model.predict(cars).as_data_frame()
metrics = glm_model.training_model_metrics()
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable metrics is not used.

Copilot uses AI. Check for mistakes.
Comment on lines 30 to 31
metrics_2 = glm_model_2.training_model_metrics()
#print(metrics_2)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable metrics_2 is not used.

Suggested change
metrics_2 = glm_model_2.training_model_metrics()
#print(metrics_2)
#print(glm_model_2.training_model_metrics())

Copilot uses AI. Check for mistakes.
glm_model_roe.train(x=["name", "power", "year"], y="economy_20mpg", training_frame=cars)

predictions_train_cv = glm_model_roe.predict(cars).as_data_frame()
metrics_cv = glm_model_roe.training_model_metrics()
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable metrics_cv is not used.

Suggested change
metrics_cv = glm_model_roe.training_model_metrics()
glm_model_roe.training_model_metrics()

Copilot uses AI. Check for mistakes.
generate_scoring_history=True)
glm_model_roe_2.train(x=["name", "power", "year"], y="economy_20mpg", training_frame=cars)
predictions_train_cv2 = glm_model_roe_2.predict(cars).as_data_frame()
metrics_cv_2 = glm_model_roe_2.training_model_metrics()
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable metrics_cv_2 is not used.

Suggested change
metrics_cv_2 = glm_model_roe_2.training_model_metrics()
glm_model_roe_2.training_model_metrics()

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant