GH-16676 GLM: Remove offset effects by maurever · Pull Request #16749 · h2oai/h2o-3

maurever · 2026-01-22T15:30:52Z

Copilot

Pull request overview

Adds a new experimental GLM option remove_offset_effects to keep offsets during training but remove their effect during scoring/prediction and model metrics, aligning with the “restricted vs unrestricted” model pattern already used for control_variables.

Changes:

Introduces remove_offset_effects parameter in GLM (backend + REST schema) and exposes it in R/Python clients.
Updates GLM scoring/metrics/scoring-history flow to compute both restricted (offset removed) and unrestricted metrics, and enables make_unrestricted_glm_model for this use case.
Adds docs + new tests/examples across Java/R/Python to exercise the feature.

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 20 comments.

Show a summary per file

File	Description
h2o-algos/src/main/java/hex/glm/GLM.java	Implements restricted/unrestricted scoring-history + metrics computation when remove_offset_effects is enabled.
h2o-algos/src/main/java/hex/glm/GLMModel.java	Adds new parameter + basic validation for remove_offset_effects.
h2o-algos/src/main/java/hex/glm/GLMScore.java	Skips adding offset into the linear predictor when restricted scoring is enabled.
h2o-algos/src/main/java/hex/glm/GLMUtils.java	Renames/extends scoring history combiner for “restricted” use.
h2o-algos/src/main/java/hex/schemas/GLMV3.java	Exposes remove_offset_effects via REST schema.
h2o-algos/src/main/java/hex/api/MakeGLMModelHandler.java	Allows creating unrestricted model when remove_offset_effects was used; resets the flag on the derived model.
h2o-algos/src/test/java/hex/glm/GLMControlVariablesTest.java	Adds backend tests for remove_offset_effects behavior and its interaction with control_variables.
h2o-r/h2o-package/R/glm.R	Adds R API parameter + expands make_unrestricted_glm_model guard.
h2o-bindings/bin/custom/R/gen_glm.py	Updates generated R binding template for make_unrestricted_glm_model guard.
h2o-r/tests/testdir_algos/glm/runit_GLM_remove_offset_effects_explain.R	Adds an R explain/learning-curve smoke test with remove_offset_effects.
h2o-py/h2o/estimators/glm.py	Adds Python API parameter + getter/setter.
h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_effects.py	Adds Python test comparing behavior with/without remove_offset_effects.
h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_glm.py	Adds Python test scaffold around offset scoring behavior.
h2o-docs/src/product/data-science/algo-params/remove_offset_effects.rst	Documents the new parameter and provides examples.
h2o-docs/src/product/data-science/algo-params/control_variables.rst	Links control_variables docs to remove_offset_effects docs.
h2o-core/src/main/java/hex/ModelMetricsBinomial.java	Minor signature cleanup (parameter rename).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-12T16:09:36Z

h2o-r/tests/testdir_algos/glm/runit_GLM_remove_offset_effects_explain.R

+glm_control_variables_explain <- function() {
+    df <- h2o.importFile("https://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv")


The test helper function name is still glm_control_variables_explain, which is misleading for a remove_offset_effects test and makes failures harder to triage. Please rename it (and the doTest description) to reflect remove_offset_effects.

Copilot · 2026-02-12T16:09:36Z

h2o-r/tests/testdir_algos/glm/runit_GLM_remove_offset_effects_explain.R

+    h2o.explain(unrestricted_prostate_glm, df)
+}
+
+doTest("GLM: Control variables works with expain", glm_control_variables_explain)


Fix typo in test description string: "expain" → "explain".

Suggested change

doTest("GLM: Control variables works with expain", glm_control_variables_explain)

doTest("GLM: Control variables works with explain", glm_control_variables_explain)

Copilot · 2026-02-12T16:09:36Z

h2o-algos/src/test/java/hex/glm/GLMControlVariablesTest.java

+                if(preds.vec(2).at(i) != preds2.vec(2).at(i)) differ++;
+            }
+            System.out.println(differ + " " + threshold);
+            assert differ > threshold;


This uses the Java assert keyword, which is typically disabled unless tests are run with -ea, so the check may not execute in CI. Prefer JUnit assertions (eg Assert.assertTrue(...)) to ensure the test always enforces the condition.

Suggested change

assert differ > threshold;

assertTrue("Expected number of differing predictions to exceed threshold", differ > threshold);

Copilot · 2026-02-12T16:09:37Z

h2o-algos/src/main/java/hex/glm/GLM.java

+        } if (_model._parms._remove_offset_effects) {
+            _scoringHistoryUnrestrictedModel.addIterationScore(_state._iter, _state.likelihood(), _state.objective());
+            _scoringHistory.addIterationScore(_state._iter, _state.likelihood(), _state.objective()); 


updateProgress(): this if (_remove_offset_effects) ... else ... block is not an else if to the preceding control_variables branch, so when control_variables is enabled the code will still fall into the else here and add an extra _scoringHistory.addIterationScore(...) each iteration. This can corrupt scoring history / early stopping bookkeeping; make these branches mutually exclusive.

Suggested change

} if (_model._parms._remove_offset_effects) {

_scoringHistoryUnrestrictedModel.addIterationScore(_state._iter, _state.likelihood(), _state.objective());

_scoringHistory.addIterationScore(_state._iter, _state.likelihood(), _state.objective());

} else if (_model._parms._remove_offset_effects) {

_scoringHistoryUnrestrictedModel.addIterationScore(_state._iter, _state.likelihood(), _state.objective());

_scoringHistory.addIterationScore(_state._iter, _state.likelihood(), _state.objective());

Copilot · 2026-02-12T16:09:37Z

h2o-docs/src/product/data-science/algo-params/remove_offset_effects.rst

+
+Model metrics and scoring history are calculated for both the restricted model (with offset effects removed) and the unrestricted model (with offset effect included).
+
+To get the unrestricted model with its own metrics use ``glm.make_unrestriced_glm_model()``/``h2o.make_unrestricted_glm_model(glm)``.


Docs: the reference to glm.make_unrestriced_glm_model() appears to be a typo/non-existent function (also "unrestriced" misspelled). This should point to the actual API (h2o.make_unrestricted_glm_model in R / make_unrestricted_glm_model in Python) to avoid broken guidance.

Suggested change

To get the unrestricted model with its own metrics use ``glm.make_unrestriced_glm_model()``/``h2o.make_unrestricted_glm_model(glm)``.

To get the unrestricted model with its own metrics use ``h2o.make_unrestricted_glm_model(glm)`` (R) or ``make_unrestricted_glm_model(glm)`` (Python).

Copilot · 2026-02-12T16:09:40Z

h2o-algos/src/main/java/hex/glm/GLM.java

+                        mtrain._nobs, _model._output._validation_metrics._nobs, _state.lambda(), _state.alpha());
+            } else { // only doing training deviance
+                _scoringHistory.addIterationScore(true, false, _state._iter, _state.likelihood(),
+                        _state.objective(), _state.deviance(), Double.NaN, mtrain._nobs, 1, _state.lambda(),


Variable mtrain may be null at this access as suggested by this null guard.
Variable mtrain may be null at this access as suggested by this null guard.
Variable mtrain may be null at this access as suggested by this null guard.

Copilot · 2026-02-12T16:09:40Z

h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_effects.py

+    glm_model.train(x=["name", "power", "year"], y="economy_20mpg", training_frame=cars)
+
+    predictions_train = glm_model.predict(cars).as_data_frame()
+    metrics = glm_model.training_model_metrics()


Variable metrics is not used.

Copilot · 2026-02-12T16:09:40Z

h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_effects.py

+    metrics_2 = glm_model_2.training_model_metrics()
+    #print(metrics_2)


Variable metrics_2 is not used.

Suggested change

metrics_2 = glm_model_2.training_model_metrics()

#print(metrics_2)

#print(glm_model_2.training_model_metrics())

Copilot · 2026-02-12T16:09:41Z

h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_effects.py

+    glm_model_roe.train(x=["name", "power", "year"], y="economy_20mpg", training_frame=cars)
+
+    predictions_train_cv = glm_model_roe.predict(cars).as_data_frame()
+    metrics_cv = glm_model_roe.training_model_metrics()


Variable metrics_cv is not used.

Suggested change

metrics_cv = glm_model_roe.training_model_metrics()

glm_model_roe.training_model_metrics()

Copilot · 2026-02-12T16:09:41Z

h2o-py/tests/testdir_algos/glm/pyunit_remove_offset_effects.py

+                                                   generate_scoring_history=True)
+    glm_model_roe_2.train(x=["name", "power", "year"], y="economy_20mpg", training_frame=cars)
+    predictions_train_cv2 = glm_model_roe_2.predict(cars).as_data_frame()
+    metrics_cv_2 = glm_model_roe_2.training_model_metrics()


Variable metrics_cv_2 is not used.

Suggested change

metrics_cv_2 = glm_model_roe_2.training_model_metrics()

glm_model_roe_2.training_model_metrics()

Add remove offset effect workaround

51d97f4

maurever self-assigned this Jan 22, 2026

maurever and others added 4 commits January 29, 2026 11:14

Implement remove offset effect

a25b949

GH-16676 implement offset API, tests

3df9ec4

Test the implementation is correct

34f4093

Remove unused parameter

afb6725

maurever requested review from Copilot, tomasfryda and valenad1 February 12, 2026 15:58

maurever added this to the 3.46.0.10 milestone Feb 12, 2026

Copilot started reviewing on behalf of maurever February 12, 2026 15:59 View session

Copilot AI reviewed Feb 12, 2026

View reviewed changes

Implement copilot suggestions

1d9938a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-16676 GLM: Remove offset effects#16749

GH-16676 GLM: Remove offset effects#16749
maurever wants to merge 6 commits intorel-3.46.0from
maurever_GH-16676_remove_offset_effects

maurever commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Copilot AI Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		glm_control_variables_explain <- function() {
		df <- h2o.importFile("https://h2o-public-test-data.s3.amazonaws.com/smalldata/prostate/prostate.csv")

	doTest("GLM: Control variables works with expain", glm_control_variables_explain)
	doTest("GLM: Control variables works with explain", glm_control_variables_explain)

	assert differ > threshold;
	assertTrue("Expected number of differing predictions to exceed threshold", differ > threshold);


		Model metrics and scoring history are calculated for both the restricted model (with offset effects removed) and the unrestricted model (with offset effect included).

		To get the unrestricted model with its own metrics use ``glm.make_unrestriced_glm_model()``/``h2o.make_unrestricted_glm_model(glm)``.

		metrics_2 = glm_model_2.training_model_metrics()
		#print(metrics_2)

	metrics_2 = glm_model_2.training_model_metrics()
	#print(metrics_2)
	#print(glm_model_2.training_model_metrics())

	metrics_cv = glm_model_roe.training_model_metrics()
	glm_model_roe.training_model_metrics()

	metrics_cv_2 = glm_model_roe_2.training_model_metrics()
	glm_model_roe_2.training_model_metrics()

Conversation

maurever commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant