Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
7a1572f
Added method to provide .transform in the scorecard.
jnsofini Feb 19, 2025
3ac8599
Added updates as new feature in release notes
jnsofini Feb 25, 2025
ae6d56d
Merge pull request #346 from jnsofini/feature/add-score-transform
guillermo-navas-palencia Mar 5, 2025
9bdb4bf
use weighted min and max bin size, and correct decision tree hyperpar…
YC-1412 May 28, 2025
c27af4f
fix ssum for std calculation
YC-1412 May 28, 2025
92b1329
return expected 8 number of items instead of 3
YC-1412 May 28, 2025
f363693
add dumpy _n_samples_weighted (same as _n_samples) for multiclass opt…
YC-1412 May 30, 2025
0374492
increase test piecewise tolerance to 1e-4. See details in https://git…
YC-1412 Jun 2, 2025
9d34b08
increase test piecewise tolerance to 1e-4. See details in https://git…
YC-1412 Jun 3, 2025
da450a1
Merge pull request #358 from YC-1412/update-compute-prebin-returns
guillermo-navas-palencia Jun 3, 2025
e6bf706
update with develop
YC-1412 Jun 3, 2025
44351e5
merge with developMerge branch 'develop' into 323-fix_bin_size_with_s…
YC-1412 Jun 3, 2025
3eb7a34
Merge pull request #360 from YC-1412/324-fix_std
guillermo-navas-palencia Jun 3, 2025
ad30101
add notes that min/max_bin_size include missng and special group
YC-1412 Jun 4, 2025
10dd47a
Merge pull request #359 from YC-1412/323-fix_bin_size_with_sample_weight
guillermo-navas-palencia Jun 4, 2025
188d95b
feat: add to_dict method for binning table serialization
sudo-hannes Aug 28, 2025
d38c8a7
Merge branch 'develop' into add-to-dict-method
sudo-hannes Aug 29, 2025
9b989e8
refactor: remove unnecessary assignment of binning_table in OptimalBi…
sudo-hannes Aug 29, 2025
a2bab32
fix: remove unnecessary blank line in OptimalBinning class
sudo-hannes Aug 29, 2025
0f3ad8d
Replace Boston dataset URL with local CSV file path for testing
sudo-hannes Aug 29, 2025
856f2c3
Fix data source filename in load_boston function from boston_dataset.…
sudo-hannes Aug 29, 2025
5b2be3d
Merge pull request #372 from sudo-hannes/add-boston-dataset-to-repo
guillermo-navas-palencia Aug 30, 2025
b44e4cd
Merge branch 'develop' into add-to-dict-method
sudo-hannes Aug 31, 2025
a313fea
fix: Update check_array calls to use ensure_all_finite instead of for…
sudo-hannes Aug 31, 2025
30ad1f8
Merge pull request #373 from sudo-hannes/update-force-all-finite-to-e…
guillermo-navas-palencia Sep 1, 2025
058c039
Merge branch 'develop' into add-to-dict-method
sudo-hannes Sep 1, 2025
667abdd
fix: Correct indentation in docstring for OptimalBinning class
sudo-hannes Sep 1, 2025
b00530b
Merge pull request #371 from sudo-hannes/add-to-dict-method
guillermo-navas-palencia Sep 1, 2025
8a8d196
Update release notes and version to 0.21
guillermo-navas-palencia Oct 26, 2025
520fcc6
Update 2024 -> 2025
guillermo-navas-palencia Oct 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ OptBinning requires
* ortools (>=9.4)
* pandas
* ropwr (>=1.0.0)
* scikit-learn (>=1.0.2)
* scikit-learn (>=1.6.0)
* scipy (>=1.6.0)

OptBinning[distributed] requires additional packages
Expand Down
6 changes: 3 additions & 3 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,13 @@
# -- Project information -----------------------------------------------------

project = 'optbinning'
copyright = '2019 - 2024, Guillermo Navas-Palencia'
copyright = '2019 - 2025, Guillermo Navas-Palencia'
author = 'Guillermo Navas-Palencia'

# The short X.Y version
version = '0.20.0'
version = '0.21.0'
# The full version, including alpha/beta/rc tags
release = '0.20.0'
release = '0.21.0'


# -- General configuration ---------------------------------------------------
Expand Down
14 changes: 14 additions & 0 deletions doc/source/release_notes.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,19 @@
Release Notes
=============
Version 0.21.0 (2025-10-26)
---------------------------

New features:

- Add ``transform`` method in scorecard (`Issue 347 <https://github.com/guillermo-navas-palencia/optbinning/issues/346>`_).
- Add ``to_dict`` method for binning table serialization (`Issue 347 <https://github.com/guillermo-navas-palencia/optbinning/issues/371>`_).
- Replace Boston dataset (`Issue 347 <https://github.com/guillermo-navas-palencia/optbinning/issues/372>`_).

Bugfixes:

- Use weighted min and max bin size, and correct decision tree hyperparameters when sample_weight is provided (`Issue 347 <https://github.com/guillermo-navas-palencia/optbinning/issues/359>`_).
- Fix ssym for std calculation (`Issue 347 <https://github.com/guillermo-navas-palencia/optbinning/issues/360>`_)..


Version 0.20.1 (2025-02-23)
---------------------------
Expand Down
2 changes: 1 addition & 1 deletion optbinning/_version.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
"""Version information."""

__version__ = "0.20.1"
__version__ = "0.21.0"
58 changes: 40 additions & 18 deletions optbinning/binning/binning.py
Original file line number Diff line number Diff line change
Expand Up @@ -290,7 +290,8 @@ class OptimalBinning(BaseOptimalBinning):
The maximum number of bins after pre-binning (prebins).

min_prebin_size : float (default=0.05)
The fraction of mininum number of records for each prebin.
The fraction of mininum number of records for each prebin
(including missing and ``special_code`` groups).

min_n_bins : int or None, optional (default=None)
The minimum number of bins. If None, then ``min_n_bins`` is
Expand All @@ -301,11 +302,13 @@ class OptimalBinning(BaseOptimalBinning):
a value in ``[0, max_n_prebins]``.

min_bin_size : float or None, optional (default=None)
The fraction of minimum number of records for each bin. If None,
The fraction of minimum number of records for each bin
(including missing and ``special_code`` groups). If None,
``min_bin_size = min_prebin_size``.

max_bin_size : float or None, optional (default=None)
The fraction of maximum number of records for each bin. If None,
The fraction of maximum number of records for each bin
(including missing and ``special_code`` groups). If None,
``max_bin_size = 1.0``.

min_bin_n_nonevent : int or None, optional (default=None)
Expand Down Expand Up @@ -516,6 +519,7 @@ def __init__(self, name="", dtype="numerical", prebinning_method="cart",
self._n_prebins = None
self._n_refinements = 0
self._n_samples = None
self._n_samples_weighted = None
self._optimizer = None
self._solution = None
self._splits_optimal = None
Expand Down Expand Up @@ -711,10 +715,15 @@ def _fit(self, x, y, sample_weight, check_input):
logger.info("Pre-processing started.")

self._n_samples = len(x)
self._n_samples_weighted = sum(sample_weight) if sample_weight is not None else len(x)

if self.verbose:
logger.info("Pre-processing: number of samples: {}"
.format(self._n_samples))
if self._n_samples == self._n_samples_weighted:
logger.info("Pre-processing: number of samples: {}"
.format(self._n_samples))
else:
logger.info("Pre-processing: number of samples: {}. Weighted samples: {}"
.format(self._n_samples, self._n_samples_weighted))

time_preprocessing = time.perf_counter()

Expand Down Expand Up @@ -784,7 +793,7 @@ def _fit(self, x, y, sample_weight, check_input):
if self.dtype == "numerical":
user_splits = check_array(
self.user_splits, ensure_2d=False, dtype=None,
force_all_finite=True)
ensure_all_finite=True)

if len(set(user_splits)) != len(user_splits):
raise ValueError("User splits are not unique.")
Expand Down Expand Up @@ -880,7 +889,7 @@ def _fit_prebinning(self, x, y, y_missing, x_special, y_special, y_others,
class_weight=None, sw_clean=None, sw_missing=None,
sw_special=None, sw_others=None):

min_bin_size = int(np.ceil(self.min_prebin_size * self._n_samples))
min_bin_size = int(np.ceil(self.min_prebin_size * self._n_samples_weighted))

prebinning = PreBinning(method=self.prebinning_method,
n_bins=self.max_n_prebins,
Expand Down Expand Up @@ -916,12 +925,12 @@ def _fit_optimizer(self, splits, n_nonevent, n_event):

# Min/max number of bins
if self.min_bin_size is not None:
min_bin_size = int(np.ceil(self.min_bin_size * self._n_samples))
min_bin_size = int(np.ceil(self.min_bin_size * self._n_samples_weighted))
else:
min_bin_size = self.min_bin_size

if self.max_bin_size is not None:
max_bin_size = int(np.ceil(self.max_bin_size * self._n_samples))
max_bin_size = int(np.ceil(self.max_bin_size * self._n_samples_weighted))
else:
max_bin_size = self.max_bin_size

Expand Down Expand Up @@ -1177,18 +1186,15 @@ def status(self):

return self._status

def to_json(self, path):
def to_dict(self):
"""
Save optimal bins and/or splits points and transformation depending on
the target type.
Convert optimal bins and/or splits points and transformation depending on
the target type to dictionary.

Parameters
----------
path: The path where the json is going to be saved.
Returns
-------
opt_bin_dict : dict
"""
if path is None:
raise ValueError('Specify the path for the json file')

table = self.binning_table

opt_bin_dict = dict()
Expand All @@ -1210,6 +1216,22 @@ def to_json(self, path):
opt_bin_dict['cat_others'] = table.cat_others
opt_bin_dict['user_splits'] = table.user_splits

return opt_bin_dict

def to_json(self, path):
"""
Save optimal bins and/or splits points and transformation depending on
the target type.

Parameters
----------
path: The path where the json is going to be saved.
"""
if path is None:
raise ValueError('Specify the path for the json file')

opt_bin_dict = self.to_dict()

with open(path, "w") as write_file:
json.dump(opt_bin_dict, write_file)

Expand Down
13 changes: 8 additions & 5 deletions optbinning/binning/binning_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -448,7 +448,8 @@ class BinningProcess(Base, BaseEstimator, BaseBinningProcess):
The maximum number of bins after pre-binning (prebins).

min_prebin_size : float (default=0.05)
The fraction of mininum number of records for each prebin.
The fraction of mininum number of records for each prebin
(including missing and ``special_code`` groups).

min_n_bins : int or None, optional (default=None)
The minimum number of bins. If None, then ``min_n_bins`` is
Expand All @@ -459,11 +460,13 @@ class BinningProcess(Base, BaseEstimator, BaseBinningProcess):
a value in ``[0, max_n_prebins]``.

min_bin_size : float or None, optional (default=None)
The fraction of minimum number of records for each bin. If None,
The fraction of minimum number of records for each bin
(including missing and ``special_code`` groups). If None,
``min_bin_size = min_prebin_size``.

max_bin_size : float or None, optional (default=None)
The fraction of maximum number of records for each bin. If None,
The fraction of maximum number of records for each bin
(including missing and ``special_code`` groups). If None,
``max_bin_size = 1.0``.

max_pvalue : float or None, optional (default=None)
Expand Down Expand Up @@ -1082,10 +1085,10 @@ def _fit(self, X, y, sample_weight, check_input):
# check X and y data
if check_input:
X = check_array(X, ensure_2d=False, dtype=None,
force_all_finite='allow-nan')
ensure_all_finite='allow-nan')

y = check_array(y, ensure_2d=False, dtype=None,
force_all_finite=True)
ensure_all_finite=True)

check_consistent_length(X, y)

Expand Down
30 changes: 20 additions & 10 deletions optbinning/binning/continuous_binning.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,8 @@ class ContinuousOptimalBinning(OptimalBinning):
The maximum number of bins after pre-binning (prebins).

min_prebin_size : float (default=0.05)
The fraction of mininum number of records for each prebin.
The fraction of mininum number of records for each prebin
(including missing and ``special_code`` groups).

min_n_bins : int or None, optional (default=None)
The minimum number of bins. If None, then ``min_n_bins`` is
Expand All @@ -219,11 +220,13 @@ class ContinuousOptimalBinning(OptimalBinning):
a value in ``[0, max_n_prebins]``.

min_bin_size : float or None, optional (default=None)
The fraction of minimum number of records for each bin. If None,
The fraction of minimum number of records for each bin
(including missing and ``special_code`` groups). If None,
``min_bin_size = min_prebin_size``.

max_bin_size : float or None, optional (default=None)
The fraction of maximum number of records for each bin. If None,
The fraction of maximum number of records for each bin
(including missing and ``special_code`` groups). If None,
``max_bin_size = 1.0``.

monotonic_trend : str or None, optional (default="auto")
Expand Down Expand Up @@ -400,6 +403,7 @@ def __init__(self, name="", dtype="numerical", prebinning_method="cart",
self._n_prebins = None
self._n_refinements = 0
self._n_samples = None
self._n_samples_weighted = None
self._optimizer = None
self._splits_optimal = None
self._status = None
Expand Down Expand Up @@ -559,10 +563,15 @@ def _fit(self, x, y, sample_weight, check_input):
logger.info("Pre-processing started.")

self._n_samples = len(x)
self._n_samples_weighted = sum(sample_weight) if sample_weight is not None else len(x)

if self.verbose:
logger.info("Pre-processing: number of samples: {}"
.format(self._n_samples))
if self._n_samples == self._n_samples_weighted:
logger.info("Pre-processing: number of samples: {}"
.format(self._n_samples))
else:
logger.info("Pre-processing: number of samples: {}. Weighted samples: {}"
.format(self._n_samples, self._n_samples_weighted))

time_preprocessing = time.perf_counter()

Expand Down Expand Up @@ -633,7 +642,7 @@ def _fit(self, x, y, sample_weight, check_input):
if self.dtype == "numerical":
user_splits = check_array(
self.user_splits, ensure_2d=False, dtype=None,
force_all_finite=True)
ensure_all_finite=True)

if len(set(user_splits)) != len(user_splits):
raise ValueError("User splits are not unique.")
Expand Down Expand Up @@ -757,12 +766,12 @@ def _fit_optimizer(self, splits, n_records, sums, ssums, stds):
return

if self.min_bin_size is not None:
min_bin_size = int(np.ceil(self.min_bin_size * self._n_samples))
min_bin_size = int(np.ceil(self.min_bin_size * self._n_samples_weighted))
else:
min_bin_size = self.min_bin_size

if self.max_bin_size is not None:
max_bin_size = int(np.ceil(self.max_bin_size * self._n_samples))
max_bin_size = int(np.ceil(self.max_bin_size * self._n_samples_weighted))
else:
max_bin_size = self.max_bin_size

Expand Down Expand Up @@ -897,7 +906,8 @@ def _prebinning_refinement(self, splits_prebinning, x, y, y_missing,
def _compute_prebins(self, splits_prebinning, x, y, sw):
n_splits = len(splits_prebinning)
if not n_splits:
return splits_prebinning, np.array([]), np.array([])
return (splits_prebinning, np.array([]), np.array([]), np.array([]),
np.array([]), np.array([]), np.array([]), np.array([]))

if self.dtype == "categorical" and self.user_splits is not None:
indices = np.digitize(x, splits_prebinning, right=True)
Expand All @@ -920,7 +930,7 @@ def _compute_prebins(self, splits_prebinning, x, y, sw):
n_records[i] = np.sum(sw[mask])
ymask = sw[mask] * y[mask]
sums[i] = np.sum(ymask)
ssums[i] = np.sum(ymask ** 2)
ssums[i] = np.sum(sw[mask] * (y[mask] ** 2))
n_zeros[i] = np.count_nonzero(ymask == 0)
if len(ymask):
stds[i] = np.std(ymask)
Expand Down
4 changes: 2 additions & 2 deletions optbinning/binning/mdlp.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,8 +99,8 @@ def fit(self, x, y):
def _fit(self, x, y):
_check_parameters(**self.get_params())

x = check_array(x, ensure_2d=False, force_all_finite=True)
y = check_array(y, ensure_2d=False, force_all_finite=True)
x = check_array(x, ensure_2d=False, ensure_all_finite=True)
y = check_array(y, ensure_2d=False, ensure_all_finite=True)

idx = np.argsort(x)
x = x[idx]
Expand Down
4 changes: 2 additions & 2 deletions optbinning/binning/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@


def _check_x_y(x, y):
x = check_array(x, ensure_2d=False, force_all_finite=True)
y = check_array(y, ensure_2d=False, force_all_finite=True)
x = check_array(x, ensure_2d=False, ensure_all_finite=True)
y = check_array(y, ensure_2d=False, ensure_all_finite=True)

check_consistent_length(x, y)

Expand Down
13 changes: 9 additions & 4 deletions optbinning/binning/multiclass_binning.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,8 @@ class MulticlassOptimalBinning(OptimalBinning):
The maximum number of bins after pre-binning (prebins).

min_prebin_size : float (default=0.05)
The fraction of mininum number of records for each prebin.
The fraction of mininum number of records for each prebin
(including missing and ``special_code`` groups).

min_n_bins : int or None, optional (default=None)
The minimum number of bins. If None, then ``min_n_bins`` is
Expand All @@ -225,11 +226,13 @@ class MulticlassOptimalBinning(OptimalBinning):
a value in ``[0, max_n_prebins]``.

min_bin_size : float or None, optional (default=None)
The fraction of minimum number of records for each bin. If None,
The fraction of minimum number of records for each bin
(including missing and ``special_code`` groups). If None,
``min_bin_size = min_prebin_size``.

max_bin_size : float or None, optional (default=None)
The fraction of maximum number of records for each bin. If None,
The fraction of maximum number of records for each bin
(including missing and ``special_code`` groups). If None,
``max_bin_size = 1.0``.

monotonic_trend : str, array-like or None, optional (default="auto")
Expand Down Expand Up @@ -360,6 +363,7 @@ def __init__(self, name="", prebinning_method="cart", solver="cp",
self._n_prebins = None
self._n_refinements = 0
self._n_samples = None
self._n_samples_weighted = None
self._optimizer = None
self._splits_optimal = None
self._status = None
Expand Down Expand Up @@ -504,6 +508,7 @@ def _fit(self, x, y, check_input):
logger.info("Pre-processing started.")

self._n_samples = len(x)
self._n_samples_weighted = self._n_samples

if self.verbose:
logger.info("Pre-processing: number of samples: {}"
Expand Down Expand Up @@ -560,7 +565,7 @@ def _fit(self, x, y, check_input):
.format(n_splits))

user_splits = check_array(self.user_splits, ensure_2d=False,
dtype=None, force_all_finite=True)
dtype=None, ensure_all_finite=True)

if len(set(user_splits)) != len(user_splits):
raise ValueError("User splits are not unique.")
Expand Down
Loading