Skip to content

Commit cecabff

Browse files
talgalilimeta-codesync[bot]
authored andcommitted
Shorten CHANGELOG for v0.13.0 release
Summary: This commit updates the CHANGELOG to be shorter before releasing 0.13.0. Also shorten the welcome message. Differential Revision: D88159391 fbshipit-source-id: c82301511c24be23171a7b149e26c348f5d22391
1 parent 18aed9f commit cecabff

File tree

2 files changed

+75
-149
lines changed

2 files changed

+75
-149
lines changed

CHANGELOG.md

Lines changed: 65 additions & 137 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,32 @@
1-
# 0.12.x (2025-11-21)
1+
# 0.13.x (???)
22

3-
> TODO: update 0.12.x to 0.13.0 before release.
3+
> TODO: update to final version
4+
5+
# 0.13.0 (2025-12-02)
46

57
## New Features
68

7-
- **Raking algorithm refactor**
8-
- Removed `ipfn` dependency and replaced with a vectorized NumPy
9-
implementation (`_run_ipf_numpy`) for iterative proportional fitting,
10-
resulting in significant performance improvements and eliminating external
11-
dependency ([#135](https://github.com/facebookresearch/balance/pull/135)).
12-
- **Propensity modeling flexibility**
13-
- `ipw()` now accepts any sklearn classifier via the `model` argument and
14-
deprecates the old `sklearn_model` alias, enabling the use of models like
15-
random forests while preserving all existing trimming and diagnostic
16-
workflows. Dense-only estimators and models without linear coefficients are
17-
fully supported, and propensity probabilities are stabilized to avoid
18-
numerical issues.
19-
- Implemented logistic regression customization by passing a configured
9+
- **Propensity modeling beyond static logistic regression**
10+
- `ipw()` now accepts any sklearn classifier via the `model` argument,
11+
enabling the use of models like random forests and gradient boosting while
12+
preserving all existing trimming and diagnostic features. Dense-only
13+
estimators and models without linear coefficients are fully supported.
14+
Propensity probabilities are stabilized to avoid numerical issues.
15+
- Allow customization of logistic regression by passing a configured
2016
:class:`~sklearn.linear_model.LogisticRegression` instance through the
21-
`model` argument; the CLI now accepts `--ipw_logistic_regression_kwargs`
22-
JSON to build that estimator directly for command-line workflows.
17+
`model` argument. Also, the CLI now accepts
18+
`--ipw_logistic_regression_kwargs` JSON to build that estimator directly for
19+
command-line workflows.
2320
- **Covariate diagnostics**
2421
- Added KL divergence calculations for covariate comparisons (numeric and
2522
one-hot categorical), exposed via `BalanceDF.kld()` alongside linked-sample
2623
aggregation support.
27-
- **Renamed Balance___DF to BalanceDF___**
28-
- BalanceCovarsDF to BalanceDFCovars
29-
- BalanceOutcomesDF to BalanceDFOutcomes
30-
- BalanceWeightsDF to BalanceDFWeights
24+
- **Weighting Methods**
25+
- `rake()` and `poststratify()` now honour `weight_trimming_mean_ratio` and
26+
`weight_trimming_percentile`, trimming and renormalising weights through the
27+
enhanced `trim_weights(..., target_sum_weights=...)` API so the documented
28+
parameters work as expected
29+
([#147](https://github.com/facebookresearch/balance/pull/147)).
3130

3231
## Documentation
3332

@@ -44,138 +43,67 @@
4443
([#145](https://github.com/facebookresearch/balance/pull/145)).
4544
- Added IPW quickstart tutorial showcasing default logistic regression and
4645
custom sklearn classifier usage in (`balance_quickstart.ipynb`).
46+
- Shorten the welcome message (for when importing the package).
4747

4848
## Code Quality & Refactoring
4949

50+
- **Raking algorithm refactor**
51+
- Removed `ipfn` dependency and replaced with a vectorized NumPy
52+
implementation (`_run_ipf_numpy`) for iterative proportional fitting,
53+
resulting in significant performance improvements and eliminating external
54+
dependency ([#135](https://github.com/facebookresearch/balance/pull/135)).
55+
5056
- **IPW method refactoring**
5157
- Reduced Cyclomatic Complexity Number (CCN) by extracting repeated code
52-
patterns into reusable helper functions:
53-
- Added `_compute_deviance()` to consolidate `2 * log_loss(...)` pattern
54-
(used 4+ times)
55-
- Added `_compute_proportion_deviance()` to consolidate `1 - dev/null_dev`
56-
pattern
57-
- Added `_convert_to_dense_array()` to consolidate sparse-to-dense matrix
58-
conversion pattern (CSC→CSR→dense)
59-
- Improved code maintainability by eliminating duplication in deviance
60-
calculations and matrix conversions
61-
- Fixed TODO: Removed manual ASMD improvement calculation and now uses
62-
existing `compute_asmd_improvement()` from `weighted_comparisons_stats.py`
58+
patterns into reusable helper functions: `_compute_deviance()`,
59+
`_compute_proportion_deviance()`, `_convert_to_dense_array()`.
60+
- Removed manual ASMD improvement calculation and now uses existing
61+
`compute_asmd_improvement()` from `weighted_comparisons_stats.py`
62+
6363
- **Type safety improvements**
64-
- **Pyre-strict migration**: Converted 32 Python files from `# pyre-unsafe` to
65-
`# pyre-strict` mode, significantly improving type safety across the
66-
codebase. Files converted include core modules (`__init__.py`,
67-
`adjustment.py`, `balancedf_class.py`, `cli.py`, `sample_class.py`,
68-
`util.py`, `typing.py`), statistics modules
69-
(`stats_and_plots/general_stats.py`, `stats_and_plots/weighted_stats.py`,
70-
`stats_and_plots/weighted_comparisons_plots.py`,
71-
`stats_and_plots/weighted_comparisons_stats.py`,
72-
`stats_and_plots/weights_stats.py`), weighting methods
73-
(`weighting_methods/cbps.py`, `weighting_methods/ipw.py`,
74-
`weighting_methods/poststratify.py`, `weighting_methods/rake.py`), datasets
75-
module (`datasets/__init__.py`), and test files
76-
(`parent_balance/tests/test_adjust_null.py`,
77-
`parent_balance/tests/test_adjustment.py`,
78-
`parent_balance/tests/test_cbps.py`, `parent_balance/tests/test_cli.py`,
79-
`parent_balance/tests/test_datasets.py`, `parent_balance/tests/test_ipw.py`,
80-
`parent_balance/tests/test_logging.py`,
81-
`parent_balance/tests/test_poststratify.py`,
82-
`parent_balance/tests/test_rake.py`, `parent_balance/tests/test_sample.py`,
83-
`parent_balance/tests/test_stats_and_plots.py`,
84-
`parent_balance/tests/test_testutil.py`,
85-
`parent_balance/tests/test_util.py`)
86-
- **Modernized type hints to PEP 604 syntax**: Updated all type annotations
87-
across 11 files to use the newer PEP 604 union syntax (`X | Y` instead of
88-
`Union[X, Y]` and `X | None` instead of `Optional[X]`), improving code
89-
readability and aligning with Python 3.10+ typing conventions. Updated
90-
`from __future__ import` statements to use `annotations` instead of the
91-
older `absolute_import, division, print_function, unicode_literals`. Removed
92-
unnecessary `Union` and `Optional` imports from `typing`. Files updated:
93-
`__init__.py`, `adjustment.py`, `balancedf_class.py`, `cli.py`,
94-
`datasets/__init__.py`, `sample_class.py`,
95-
`stats_and_plots/weighted_comparisons_stats.py`,
96-
`stats_and_plots/weighted_stats.py`, `stats_and_plots/weights_stats.py`,
97-
`util.py`, `weighting_methods/ipw.py`.
98-
- **Important compatibility note**: Type alias definitions in `typing.py`
99-
retain `Union` syntax for Python 3.9 compatibility, as the `|` operator for
100-
type aliases only works at runtime in Python 3.10+. Added comprehensive
101-
inline documentation explaining this limitation and the distinction between
102-
type annotations (which support `|` with
103-
`from __future__ import annotations`) and type alias assignments (which
104-
require `Union` for runtime evaluation in Python 3.9).
105-
- **Enhanced type safety for plotting functions**: Replaced loose dictionary
106-
type hints with structured `TypedDict` definition (`DataFrameWithWeight`)
107-
for better type checking in `weighted_comparisons_plots.py`. Added
108-
`SampleName` type alias to precisely specify valid sample name literals.
109-
Removed numerous `# pyre-ignore` comments by properly handling type casts
110-
and narrowing types. Added validation for plotly `dist_type` parameter to
111-
raise clear errors when unsupported types are used.
112-
- Fixed missing `Any` import in `weighted_comparisons_plots.py` to resolve
113-
pyre-fixme[10] error
114-
- Added comprehensive type annotations for previously untyped parameters and
115-
return values throughout the codebase
116-
- Fixed type casts and narrowed types where appropriate
117-
- Initialized optional variables to handle pyre-fixme[61] issues
118-
- Updated method signatures to match parent class interfaces
119-
- **Replaced assert-based type narrowing with `_verify_value_type()` helper**:
120-
Refactored code to use the `_verify_value_type()` utility function instead
121-
of bare `assert x is not None` statements for type narrowing. This improves
122-
code clarity, provides better error messages, and follows best practices for
123-
pyre-strict mode. Enhanced `_verify_value_type()` in `testutil.py` with
124-
optional type checking via `isinstance()` and improved overload signatures.
125-
Changes applied to test files (`test_datasets.py`, `test_sample.py`,
126-
`test_stats_and_plots.py`, `test_testutil.py`, `test_util.py`,
127-
`test_weighted_comparisons_plots.py`) and production code (`ipw.py`).
64+
- Migrated 32 Python files from `# pyre-unsafe` to `# pyre-strict` mode,
65+
covering core modules, statistics, weighting methods, datasets, and test
66+
files
67+
- Modernized type hints to PEP 604 syntax (`X | Y` instead of `Union[X, Y]`)
68+
across 11 files for improved readability and Python 3.10+ alignment
69+
- Type alias definitions in `typing.py` retain `Union` syntax for Python 3.9
70+
compatibility
71+
- Enhanced plotting function type safety with `TypedDict` definitions and
72+
proper type narrowing
73+
- Replaced assert-based type narrowing with `_verify_value_type()` helper for
74+
better error messages and pyre-strict compliance
75+
76+
- **Renamed Balance**_DF to BalanceDF_\*\*\*\*
77+
- BalanceCovarsDF to BalanceDFCovars
78+
- BalanceOutcomesDF to BalanceDFOutcomes
79+
- BalanceWeightsDF to BalanceDFWeights
12880

12981
## Bug Fixes
13082

13183
- **Utility Functions**
132-
- Improved `quantize` function: preserves column ordering and replaces
133-
assertions with proper TypeError exceptions
134-
([#133](https://github.com/facebookresearch/balance/pull/133)).
84+
- Fixed `quantize()` to preserve column ordering and use proper TypeError
85+
exceptions ([#133](https://github.com/facebookresearch/balance/pull/133))
13586
- **Statistical Functions**
136-
- **Fixed division by zero in `asmd_improvement()`**: Added safety check to
137-
prevent RuntimeWarning when `asmd_mean_before` is zero or very close to zero
138-
(< 1e-10). The function now returns `0.0` (representing 0% improvement) when
139-
the sample was already perfectly matched to the target before adjustment,
140-
which is the semantically correct result. This eliminates the "invalid value
141-
encountered in scalar divide" warning that appeared in test runs.
142-
- **Weighting Methods**
143-
- `rake()` and `poststratify()` now honour `weight_trimming_mean_ratio` and
144-
`weight_trimming_percentile`, trimming and renormalising weights through the
145-
enhanced `trim_weights(..., target_sum_weights=...)` API so the documented
146-
parameters work as expected
147-
([#147](https://github.com/facebookresearch/balance/pull/147)).
87+
- Fixed division by zero in `asmd_improvement()` when `asmd_mean_before` is
88+
zero, now returns `0.0` for 0% improvement
14889
- **CLI & Infrastructure**
149-
- Replaced deprecated argparse FileType with pathlib.Path, eliminating
150-
PendingDeprecationWarning
151-
([#134](https://github.com/facebookresearch/balance/pull/134)).
90+
- Replaced deprecated argparse FileType with pathlib.Path
91+
([#134](https://github.com/facebookresearch/balance/pull/134))
15292
- **Weight Trimming**
153-
- Ensured both `weight_trimming_mean_ratio` and `weight_trimming_percentile`
154-
paths in `trim_weights()` return `pd.Series` with `dtype=np.float64` and
155-
preserve the original index.
156-
- **Fixed edge case in percentile-based winsorization**: `_validate_limit()`
157-
now automatically adjusts percentile limits upward by
158-
`min(2/n_weights, limit/10)` (capped at 1.0) before passing them to
159-
`scipy.stats.mstats.winsorize`. This prevents edge cases where discrete data
160-
distributions or floating-point precision issues could prevent winsorization
161-
at exact boundary percentiles, ensuring at least one value gets winsorized
162-
when a non-zero limit is specified
163-
([#144](https://github.com/facebookresearch/balance/issues/144)).
164-
- **Improved documentation**: Enhanced docstrings for `trim_weights()` and
165-
`_validate_limit()` to clearly explain the automatic limit adjustment
166-
mechanism, provide concrete examples of percentile behavior (e.g., how
167-
single values vs. tuples work), and document the relationship between mean
168-
ratio trimming and percentile-based winsorization.
93+
- Fixed `trim_weights()` to consistently return `pd.Series` with
94+
`dtype=np.float64` and preserve original index across both trimming methods
95+
- Fixed percentile-based winsorization edge case: `_validate_limit()` now
96+
automatically adjusts limits to prevent floating-point precision issues
97+
([#144](https://github.com/facebookresearch/balance/issues/144))
98+
- Enhanced documentation for `trim_weights()` and `_validate_limit()` with
99+
clearer examples and explanations
169100

170101
## Tests
171102

172-
- Enhanced test coverage for weight trimming:
173-
- Added `test_trim_weights_return_type_consistency` to validate that both
174-
trimming methods return `pd.Series` with `dtype=np.float64` and preserve
175-
indices.
176-
- Added 11 comprehensive tests for `_validate_limit()` covering normal
177-
operation, edge cases, error conditions, type handling, and boundary
178-
conditions.
103+
- Enhanced test coverage for weight trimming with
104+
`test_trim_weights_return_type_consistency` and 11 comprehensive tests for
105+
`_validate_limit()` covering edge cases, error conditions, and boundary
106+
conditions
179107

180108
## Contributors
181109

balance/__init__.py

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -19,20 +19,18 @@
1919
from balance.util import TruncationFormatter # noqa
2020

2121
global __version__
22-
__version__ = "0.12.x"
22+
__version__ = "0.13.0"
2323

2424
WELCOME_MESSAGE = f"""
25-
Welcome to balance (Version {__version__})!
26-
An open-source Python package for balancing biased data samples.
27-
28-
📖 Documentation: https://import-balance.org/
29-
🛠️ Get Help / Report Issues: https://github.com/facebookresearch/balance/issues/
30-
📄 Citation:
31-
Sarig, T., Galili, T., & Eilat, R. (2023).
32-
balance - a Python package for balancing biased data samples.
33-
https://arxiv.org/abs/2307.06024
34-
35-
Tip: You can access this information at any time with balance.help()
25+
balance (Version {__version__}) loaded:
26+
📖 Documentation: https://import-balance.org/
27+
🛠️ Help / Issues: https://github.com/facebookresearch/balance/issues/
28+
📄 Citation:
29+
Sarig, T., Galili, T., & Eilat, R. (2023).
30+
balance - a Python package for balancing biased data samples.
31+
https://arxiv.org/abs/2307.06024
32+
33+
Tip: You can view this message anytime with balance.help()
3634
"""
3735

3836

0 commit comments

Comments
 (0)