Skip to content

[ENH] add _normalize_dist_str utility for consistent distribution string aliases#1045

Open
paramsureliya wants to merge 2 commits intosktime:mainfrom
paramsureliya:ENH/normalize-dist-str
Open

[ENH] add _normalize_dist_str utility for consistent distribution string aliases#1045
paramsureliya wants to merge 2 commits intosktime:mainfrom
paramsureliya:ENH/normalize-dist-str

Conversation

@paramsureliya
Copy link
Copy Markdown

Part of #1023

Reference Issues/PRs

Part of #1023 first step toward uniformizing distribution string handling across all probabilistic regressors.

What does this implement/fix? Explain your changes.

Adds a shared _normalize_dist_str utility (skpro/regression/_dist_utils.py) that maps case-insensitive distribution string aliases to the canonical capitalized class name used internally in skpro, e.g. "gaussian""Normal", "lognormal""LogNormal", "t""TDistribution".

Integrates this into NGBoostAdapter (inherited by both NGBoostRegressor and NGBoostSurvival) as a proof of concept. Normalization happens at point-of-use inside the adapter methods, not in __init__, so get_params()/clone() compatibility is preserved.

The remaining regressors from #1023 (XGBoostLSS, ResidualDouble, CyclicBoosting, GAMRegressor, GLMRegressor, GlumRegressor) will be addressed in follow-up PRs.

Does your contribution introduce a new dependency? If yes, which one?

No new dependencies.

What should a reviewer concentrate their feedback on?

  • The alias map in _dist_utils.py any missing or incorrectly mapped aliases?
  • The integration pattern in NGBoostAdapter normalization into a local dist variable at point-of-use, leaving self.dist unchanged for sklearn compatibility.

Did you add any tests for the change?

Yes skpro/regression/tests/test_dist_utils.py with 57 tests across 3 classes:

  • TestNormalizeDistStr: unit tests for every alias, canonical passthrough, non-string passthrough, unknown string warning, and idempotency
  • TestCrossRegressorAliasConsistency: invariant tests ensuring the same alias resolves identically regardless of which regressor calls the function
  • TestNGBoostRegressorAliases: end-to-end integration tests for NGBoostRegressor (auto-skipped if ngboost not installed)

Any other comments?

None.

- Add skpro/regression/_dist_utils.py with _normalize_dist_str that maps
  case-insensitive aliases to canonical distribution class names
  (e.g. 'gaussian' -> 'Normal', 'lognormal' -> 'LogNormal')
- Integrate into NGBoostAdapter so NGBoostRegressor and NGBoostSurvival
  both accept all aliases transparently
- Update docstrings in NGBoostRegressor and NGBoostSurvival
- Add test_dist_utils.py with 57 unit and integration tests

Part of  sktime#1023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant