Skip to content

Commit 653b419

Browse files
stanmartquant-ranger[bot]jtillyMarcAntoineSchmidtQCMatthiasSchmidtblaicherQC
authored
glum v3.0 (#677)
* Make tests green with densematrix-refactor branch * Remove most Matrixbase subclass checks * Simplify _group_sum * Pre-commit autoupdate (#672) * Use boa in CI. (#673) * Fix covariance matrix mutating feature names (#671) * Do not use _set_up_... in covariance_matrix * Add changelog entry * Add the option to store the covariance matrix to avoid recomputing it (#661) * Add option to store covariance matrix during fit * Fix fitting with variance matrix estimation `.covariance_matrix()` expects X and weights in a different format than what we have at the end of `.fit(). * Store covariance matrix after estimation * Handle the alpha_search and glm_cv cases * Propagate covariance parameters * Add changelog * Slightly more lenient tests * Pre-commit autoupdate (#676) Co-authored-by: quant-ranger[bot] <132915763+quant-ranger[bot]@users.noreply.github.com> * Fix covariance_matrix dtypes * Make CI use pre-release tabmat * Column names à la Tabmat #278 (#678) * Delegate column naming to tabmat * Add tests * More tests * Test for dropping complete categories * Add docstrings for new argument * Add changelog entry * Convert to pandas at the correct place * Reorganize converting from pandas * Remove xfail from test * Formula interface (#670) * Add formulaic to dependencies * Add function for transforming the formula * Add tests * First draft of glum formula interface * Fixes and tests * Handle intercept correctly * Add formula functionality to glm_cv * Variables from local context * Test predict with formulas * Add formula tutorial * Fix tutorial * Reformat tutorial * Improve function signatures adn docstrings * Handle two-sided formulas in covariance_matrix * Make mypy happy about module names * Matthias' suggestions * Improve tutorial * Improve tutorial * Formula- and term-based Wald-tests (#689) * Add formulaic to dependencies * Add function for transforming the formula * Add tests * First draft of glum formula interface * Fixes and tests * Handle intercept correctly * Add formula functionality to glm_cv * Variables from local context * Test predict with formulas * Add formula tutorial * Fix tutorial * Reformat tutorial * Improve function signatures adn docstrings * Handle two-sided formulas in covariance_matrix * Make mypy happy about module names * Matthias' suggestions * Add back term-based Wald-tests * Tests for term names * Add formula-based Wald-test * Tests for formula-based Wald-test * Add changelog * Fix exception message * Additional test case * make docstrings clearer in the case of terms * Support for missing values in categorical columns (#684) * Delegate column naming to tabmat * Add tests * More tests * Test for dropping complete categories * Add docstrings for new argument * Add changelog entry * Convert to pandas at the correct place * Reorganize converting from pandas * Remove xfail from test * Implement missing categorical support * Add test * Solve adding missing category when predicting * Apply Matthias' suggestions * Add changelog entry * Fix formula context (#691) * Make tests fail * Propagate context through methods * pyupgrade * ensure_full_rank != drop_first * fix * move feature name assignment to right spot * fix * remove blank line * bump minimum formulaic version (stateful transforms) * improve backward compatibility * Remove code that is not needed in tabmat v4 / glum v3 (#741) * Remove check_array from predict() We don't need it here as predict calls linear_redictor, and the latter does this check. We can avoid doing it twice. * Remove _name_categorical_variable parts There is no need for those as Tabmat v4 handles variable names internally. --------- Co-authored-by: Martin Stancsics <martin.stancsics@gmail.com> * Fix formula test: consider presence of intercept in full rankness check when constructing the model matrix externally (#746) * deal with intercept in formula test correctly * naming [skip ci] * test varying significance level in coef table test (#749) * pin formulaic to 0.6 (#752) * Add illustration of formula interface to example in README (#751) * add illustration of formula to readme * rephrase * spacing * add linear term for illustration * Determine presence of intercept only by `fit_intercept` argument (#747) * always use self.fit_intercept; raise if formula conflicts with it * wording [skip ci] * adjust other tests, cosmetics * don't compare specs with singular matrix to smf * fix smf test formula * fix intercept in context test * remove outdated sentence; clean up * fix * adjust tutorial * adjust tutorial * consistent linebreaks in docstring * remove obsolete arg in docstring * Informative error when encountering categories that were not seen in training (#748) * drop missings not seen in training * zero not drop * better (?) name [skip ci] * catch case of unseen missings and fail method * fix * respect categorical missing method with formula; test different categorical missing methods also with formula * shorten the tests * dont allow fitting in case of conversion of categoricals and presence of formula * clearer error msg * also change the error msg in the regex (facepalm) * remove matches * fix * better name * describe more restrictive behavior in tutorial * Raise error on unseen levels when predicting * Allow cat_missing_method='convert' again * Update test * Check for unseen categories * Adapt align_df_categories tests to changes * Make pre-commit happy * Avoid unnecessary work * Correctly expand penalties with categoricals and `cat_missing_method="convert"` (#753) * Correctyl expand penalties when cat_missing_method=convert * Add test * Improve variable names Co-authored-by: Matthias Schmidtblaicher <42544829+MatthiasSchmidtblaicherQC@users.noreply.github.com> --------- Co-authored-by: Matthias Schmidtblaicher <42544829+MatthiasSchmidtblaicherQC@users.noreply.github.com> * bump tabmat pre-release version --------- Co-authored-by: Martin Stancsics <martin.stancsics@gmail.com> * docstring cosmetics * even more docstring cosmetics * Do not fail when an estimator misses class members that are new in v3 (#757) * do not fail on missing class members that are new in v3 * simplify * convert * shorten the comment * simplify * don't use getattr unnecessarily * cosmetics * fix unrelated typo * tiny cosmetics [skip ci] * No regularization as default (#758) * set alpha=0 as default * fix docstring * add alpha where needed to avoid LinAlgError * add changelog entry * also set alpha in golden master * change name in persisted file too * set alpha in model_parameters again * don't modify case of no alpha attribute, which is RegressorCV * remove invalid alpha argument * wording * Improve code readability * Make arguments to public methods except `X`, `y`, `sample_weight` and `offset` keyword-only and make initialization keyword-only (#764) * make all args except X, y, sample_weight, offset keyword only; make initialization keyword only * add changelog [skip ci] * mention that also RegressorBase was changed [skip ci] * fix import * clean up changelog * Restructure distributions (#768) * Explain `scale_predictors` more (#778) * Expand on effect of scale_predictors and remove note * Update src/glum/_glm.py Co-authored-by: Jan Tilly <jan.tilly@quantco.com> * remove sentence --------- Co-authored-by: Jan Tilly <jan.tilly@quantco.com> * Move helpers into `_utils` (#782) * Patch docstring * Update CHANGELOG.rst Co-authored-by: Luca Bittarello <15511539+lbittarello@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Luca Bittarello <15511539+lbittarello@users.noreply.github.com> * shorten docstrings of private functions; typos in defaults; other suggestions * context docstring * kwargs * no context as default; small cleanups * add explanation to get calling scope * adjust to tabmat release * keep whitespace * temporarily add tabmat_dev channel again to investigate env solving failure on CI * remove tabmat_dev channel again * for now, disable conda build test on osx and Python 3.12 * Add a different environment for macos (#786) * try solving on ci with different env for macos * add missing if * typo * try and remove --no-test flag * replace deprecated scipy.sparse.*_matrix.A * replace other instance of .A * two more * simply replace all instances of .A by .toarray() (tabmat knows both) * update CHANGELOG for release --------- Co-authored-by: quant-ranger[bot] <132915763+quant-ranger[bot]@users.noreply.github.com> Co-authored-by: Jan Tilly <jan.tilly@quantco.com> Co-authored-by: Marc-Antoine Schmidt <marc-antoine.schmidt@quantco.com> Co-authored-by: Matthias Schmidtblaicher <matthias.schmidtblaicher@quantco.com> Co-authored-by: Matthias Schmidtblaicher <42544829+MatthiasSchmidtblaicherQC@users.noreply.github.com> Co-authored-by: Martin Stancsics <martin.stancsics@gmail.com> Co-authored-by: Luca Bittarello <15511539+lbittarello@users.noreply.github.com> Co-authored-by: lbittarello <luca.bittarello@gmail.com>
1 parent 954abc6 commit 653b419

26 files changed

+4629
-1390
lines changed

.github/workflows/ci.yml

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,14 +40,24 @@ jobs:
4040
steps:
4141
- name: Checkout branch
4242
uses: actions/checkout@v4
43-
- name: Set up conda env
43+
- name: Set up conda env (windows and ubuntu)
44+
if: matrix.os != 'macos-latest'
4445
uses: mamba-org/setup-micromamba@422500192359a097648154e8db4e39bdb6c6eed7
4546
with:
4647
environment-file: environment.yml
4748
init-shell: ${{ matrix.os == 'windows-latest' && 'powershell' || 'bash' }}
4849
cache-environment: true
4950
create-args: >-
5051
python=${{ matrix.python-version }}
52+
- name: Set up conda env (macos)
53+
if: matrix.os == 'macos-latest'
54+
uses: mamba-org/setup-micromamba@422500192359a097648154e8db4e39bdb6c6eed7
55+
with:
56+
environment-file: environment-macos.yml
57+
init-shell: bash
58+
cache-environment: true
59+
create-args: >-
60+
python=${{ matrix.python-version }}
5161
- name: Install repository (unix)
5262
if: matrix.os != 'windows-latest'
5363
shell: bash -el {0}

.github/workflows/conda-build.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ jobs:
2424
- { conda_build_yml: linux_64_python3.12.____cpython, os: ubuntu-latest, conda-build-args: '' }
2525
- { conda_build_yml: osx_64_python3.9.____cpython, os: macos-latest, conda-build-args: '' }
2626
- { conda_build_yml: osx_64_python3.12.____cpython, os: macos-latest, conda-build-args: '' }
27-
- { conda_build_yml: osx_arm64_python3.10.____cpython, os: macos-latest, conda-build-args: ' --no-test' }
27+
- { conda_build_yml: osx_arm64_python3.10.____cpython, os: macos-latest, conda-build-args: '' }
2828
- { conda_build_yml: win_64_python3.9.____cpython, os: windows-latest, conda-build-args: '' }
2929
- { conda_build_yml: win_64_python3.12.____cpython, os: windows-latest, conda-build-args: '' }
3030
steps:

CHANGELOG.rst

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,28 @@
77
Changelog
88
=========
99

10+
3.0.0 - 2024-04-27
11+
------------------
12+
13+
**Breaking changes:**
14+
15+
- All arguments to :class:`~glum.GeneralizedLinearRegressorBase`, :class:`~glum.GeneralizedLinearRegressor` and :class:`GeneralizedLinearRegressorCV` are now keyword-only.
16+
- All arguments to public methods of :class:`~glum.GeneralizedLinearRegressorBase`, :class:`~glum.GeneralizedLinearRegressor` or :class:`GeneralizedLinearRegressorCV` except ``X``, ``y``, ``sample_weight`` and ``offset`` are now keyword-only.
17+
- :class:`~glum.GeneralizedLinearRegressor`'s default value for ``alpha`` is now ``0``, i.e. no regularization.
18+
- :class:`~glum.GammaDistribution`, :class:`~glum.InverseGaussianDistribution`, :class:`~glum.NormalDistribution` and :class:`~glum.PoissonDistribution` no longer inherit from :class:`~glum.TweedieDistribution`.
19+
- The power parameter of :class:`~glum.TweedieLink` has been renamed from ``p`` to ``power``, in line with :class:`~glum.TweedieDistribution`.
20+
- :class:`~glum.TweedieLink` no longer instantiates :class:`~glum.IdentityLink` or :class:`~glum.LogLink` for ``power=0`` and ``power=1``, respectively. On the other hand, :class:`~glum.TweedieLink` is now compatible with ``power=0`` and ``power=1``.
21+
22+
**New features:**
23+
24+
- Added a formula interface for specifying models.
25+
- Improved feature name handling. Feature names are now created for non-pandas input matrices too. Furthermore, the format of categorical features can be specified by the user.
26+
- Term names are now stored in the model's attributes. This is useful for categorical features, where they refer to the whole variable, not just single levels.
27+
- Added more options for treating missing values in categorical columns. They can either raise a ``ValueError`` (``"fail"``), be treated as all-zero indicators (``"zero"``) or represented as a new category (``"convert"``).
28+
- `meth:GeneralizedLinearRegressor.wald_test` can now perform tests based on a formula string and term names.
29+
- :class:`~glum.InverseGaussianDistribution` gains a :meth:`~glum.InverseGaussianDistribution.log_likelihood` method.
30+
31+
1032
2.7.0 - 2024-02-19
1133
------------------
1234

@@ -16,7 +38,7 @@ Changelog
1638

1739
**Other changes:**
1840

19-
- Require Python>=3.9 in line with `NEP 29 <https://numpy.org/neps/nep-0029-deprecation_policy.html#support-table>`_
41+
- Require Python>=3.9 in line with `NEP 29 <https://numpy.org/neps/nep-0029-deprecation_policy.html#support-table>`.
2042
- Build and test with Python 3.12 in CI.
2143
- Added line search stopping criterion for tiny loss improvements based on gradient information.
2244
- Added warnings about breaking changes in future versions.
@@ -73,6 +95,7 @@ Changelog
7395
:class:`~glum.GeneralizedLinearRegressor` and :class:`~glum.GeneralizedLinearRegressorCV`
7496
to ``'negative.binomial'``.
7597

98+
7699
2.4.1 - 2023-03-14
77100
------------------
78101

README.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ Why did we choose the name `glum`? We wanted a name that had the letters GLM and
6868
>>>
6969
>>> _ = model.fit(X=X, y=y)
7070
>>>
71-
>>> # .report_diagnostics shows details about the steps taken by the iterative solver
71+
>>> # .report_diagnostics shows details about the steps taken by the iterative solver.
7272
>>> diags = model.get_formatted_diagnostics(full_report=True)
7373
>>> diags[['objective_fct']]
7474
objective_fct
@@ -79,6 +79,15 @@ n_iter
7979
3 0.443681
8080
4 0.443498
8181
5 0.443497
82+
>>>
83+
>>> # Models can also be built with formulas from formulaic.
84+
>>> model_formula = GeneralizedLinearRegressor(
85+
... family='binomial',
86+
... l1_ratio=1.0,
87+
... alpha=0.001,
88+
... formula="bedrooms + np.log(bathrooms + 1) + bs(sqft_living, 3) + C(waterfront)"
89+
... )
90+
>>> _ = model_formula.fit(X=house_data.data, y=y)
8291

8392
```
8493

conda.recipe/meta.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,8 @@ requirements:
3535
- pandas
3636
- scikit-learn >=0.23
3737
- scipy
38-
- tabmat >=3.1.0, <4.0.0
38+
- formulaic >=0.6
39+
- tabmat >=4.0.0
3940

4041
test:
4142
requires:

0 commit comments

Comments
 (0)