[ENH] add _normalize_dist_str utility for consistent distribution string aliases#1045
Open
paramsureliya wants to merge 2 commits intosktime:mainfrom
Open
[ENH] add _normalize_dist_str utility for consistent distribution string aliases#1045paramsureliya wants to merge 2 commits intosktime:mainfrom
paramsureliya wants to merge 2 commits intosktime:mainfrom
Conversation
- Add skpro/regression/_dist_utils.py with _normalize_dist_str that maps case-insensitive aliases to canonical distribution class names (e.g. 'gaussian' -> 'Normal', 'lognormal' -> 'LogNormal') - Integrate into NGBoostAdapter so NGBoostRegressor and NGBoostSurvival both accept all aliases transparently - Update docstrings in NGBoostRegressor and NGBoostSurvival - Add test_dist_utils.py with 57 unit and integration tests Part of sktime#1023
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of #1023
Reference Issues/PRs
Part of #1023 first step toward uniformizing distribution string handling across all probabilistic regressors.
What does this implement/fix? Explain your changes.
Adds a shared
_normalize_dist_strutility (skpro/regression/_dist_utils.py) that maps case-insensitive distribution string aliases to the canonical capitalized class name used internally in skpro, e.g."gaussian"→"Normal","lognormal"→"LogNormal","t"→"TDistribution".Integrates this into
NGBoostAdapter(inherited by bothNGBoostRegressorandNGBoostSurvival) as a proof of concept. Normalization happens at point-of-use inside the adapter methods, not in__init__, soget_params()/clone()compatibility is preserved.The remaining regressors from #1023 (XGBoostLSS, ResidualDouble, CyclicBoosting, GAMRegressor, GLMRegressor, GlumRegressor) will be addressed in follow-up PRs.
Does your contribution introduce a new dependency? If yes, which one?
No new dependencies.
What should a reviewer concentrate their feedback on?
_dist_utils.pyany missing or incorrectly mapped aliases?NGBoostAdapternormalization into a localdistvariable at point-of-use, leavingself.distunchanged for sklearn compatibility.Did you add any tests for the change?
Yes
skpro/regression/tests/test_dist_utils.pywith 57 tests across 3 classes:TestNormalizeDistStr: unit tests for every alias, canonical passthrough, non-string passthrough, unknown string warning, and idempotencyTestCrossRegressorAliasConsistency: invariant tests ensuring the same alias resolves identically regardless of which regressor calls the functionTestNGBoostRegressorAliases: end-to-end integration tests for NGBoostRegressor (auto-skipped if ngboost not installed)Any other comments?
None.