|
1 | | -# 0.12.x (2025-11-21) |
| 1 | +# 0.13.x (???) |
2 | 2 |
|
3 | | -> TODO: update 0.12.x to 0.13.0 before release. |
| 3 | +> TODO: update to final version |
| 4 | +
|
| 5 | +# 0.13.0 (2025-12-02) |
4 | 6 |
|
5 | 7 | ## New Features |
6 | 8 |
|
7 | | -- **Raking algorithm refactor** |
8 | | - - Removed `ipfn` dependency and replaced with a vectorized NumPy |
9 | | - implementation (`_run_ipf_numpy`) for iterative proportional fitting, |
10 | | - resulting in significant performance improvements and eliminating external |
11 | | - dependency ([#135](https://github.com/facebookresearch/balance/pull/135)). |
12 | | -- **Propensity modeling flexibility** |
13 | | - - `ipw()` now accepts any sklearn classifier via the `model` argument and |
14 | | - deprecates the old `sklearn_model` alias, enabling the use of models like |
15 | | - random forests while preserving all existing trimming and diagnostic |
16 | | - workflows. Dense-only estimators and models without linear coefficients are |
17 | | - fully supported, and propensity probabilities are stabilized to avoid |
18 | | - numerical issues. |
19 | | - - Implemented logistic regression customization by passing a configured |
| 9 | +- **Propensity modeling beyond static logistic regression** |
| 10 | + - `ipw()` now accepts any sklearn classifier via the `model` argument, |
| 11 | + enabling the use of models like random forests and gradient boosting while |
| 12 | + preserving all existing trimming and diagnostic features. Dense-only |
| 13 | + estimators and models without linear coefficients are fully supported. |
| 14 | + Propensity probabilities are stabilized to avoid numerical issues. |
| 15 | + - Allow customization of logistic regression by passing a configured |
20 | 16 | :class:`~sklearn.linear_model.LogisticRegression` instance through the |
21 | | - `model` argument; the CLI now accepts `--ipw_logistic_regression_kwargs` |
22 | | - JSON to build that estimator directly for command-line workflows. |
| 17 | + `model` argument. Also, the CLI now accepts |
| 18 | + `--ipw_logistic_regression_kwargs` JSON to build that estimator directly for |
| 19 | + command-line workflows. |
23 | 20 | - **Covariate diagnostics** |
24 | 21 | - Added KL divergence calculations for covariate comparisons (numeric and |
25 | 22 | one-hot categorical), exposed via `BalanceDF.kld()` alongside linked-sample |
26 | 23 | aggregation support. |
27 | | -- **Renamed Balance___DF to BalanceDF___** |
28 | | - - BalanceCovarsDF to BalanceDFCovars |
29 | | - - BalanceOutcomesDF to BalanceDFOutcomes |
30 | | - - BalanceWeightsDF to BalanceDFWeights |
| 24 | +- **Weighting Methods** |
| 25 | + - `rake()` and `poststratify()` now honour `weight_trimming_mean_ratio` and |
| 26 | + `weight_trimming_percentile`, trimming and renormalising weights through the |
| 27 | + enhanced `trim_weights(..., target_sum_weights=...)` API so the documented |
| 28 | + parameters work as expected |
| 29 | + ([#147](https://github.com/facebookresearch/balance/pull/147)). |
31 | 30 |
|
32 | 31 | ## Documentation |
33 | 32 |
|
|
44 | 43 | ([#145](https://github.com/facebookresearch/balance/pull/145)). |
45 | 44 | - Added IPW quickstart tutorial showcasing default logistic regression and |
46 | 45 | custom sklearn classifier usage in (`balance_quickstart.ipynb`). |
| 46 | +- Shorten the welcome message (for when importing the package). |
47 | 47 |
|
48 | 48 | ## Code Quality & Refactoring |
49 | 49 |
|
| 50 | +- **Raking algorithm refactor** |
| 51 | + - Removed `ipfn` dependency and replaced with a vectorized NumPy |
| 52 | + implementation (`_run_ipf_numpy`) for iterative proportional fitting, |
| 53 | + resulting in significant performance improvements and eliminating external |
| 54 | + dependency ([#135](https://github.com/facebookresearch/balance/pull/135)). |
| 55 | + |
50 | 56 | - **IPW method refactoring** |
51 | 57 | - Reduced Cyclomatic Complexity Number (CCN) by extracting repeated code |
52 | | - patterns into reusable helper functions: |
53 | | - - Added `_compute_deviance()` to consolidate `2 * log_loss(...)` pattern |
54 | | - (used 4+ times) |
55 | | - - Added `_compute_proportion_deviance()` to consolidate `1 - dev/null_dev` |
56 | | - pattern |
57 | | - - Added `_convert_to_dense_array()` to consolidate sparse-to-dense matrix |
58 | | - conversion pattern (CSC→CSR→dense) |
59 | | - - Improved code maintainability by eliminating duplication in deviance |
60 | | - calculations and matrix conversions |
61 | | - - Fixed TODO: Removed manual ASMD improvement calculation and now uses |
62 | | - existing `compute_asmd_improvement()` from `weighted_comparisons_stats.py` |
| 58 | + patterns into reusable helper functions: `_compute_deviance()`, |
| 59 | + `_compute_proportion_deviance()`, `_convert_to_dense_array()`. |
| 60 | + - Removed manual ASMD improvement calculation and now uses existing |
| 61 | + `compute_asmd_improvement()` from `weighted_comparisons_stats.py` |
| 62 | + |
63 | 63 | - **Type safety improvements** |
64 | | - - **Pyre-strict migration**: Converted 32 Python files from `# pyre-unsafe` to |
65 | | - `# pyre-strict` mode, significantly improving type safety across the |
66 | | - codebase. Files converted include core modules (`__init__.py`, |
67 | | - `adjustment.py`, `balancedf_class.py`, `cli.py`, `sample_class.py`, |
68 | | - `util.py`, `typing.py`), statistics modules |
69 | | - (`stats_and_plots/general_stats.py`, `stats_and_plots/weighted_stats.py`, |
70 | | - `stats_and_plots/weighted_comparisons_plots.py`, |
71 | | - `stats_and_plots/weighted_comparisons_stats.py`, |
72 | | - `stats_and_plots/weights_stats.py`), weighting methods |
73 | | - (`weighting_methods/cbps.py`, `weighting_methods/ipw.py`, |
74 | | - `weighting_methods/poststratify.py`, `weighting_methods/rake.py`), datasets |
75 | | - module (`datasets/__init__.py`), and test files |
76 | | - (`parent_balance/tests/test_adjust_null.py`, |
77 | | - `parent_balance/tests/test_adjustment.py`, |
78 | | - `parent_balance/tests/test_cbps.py`, `parent_balance/tests/test_cli.py`, |
79 | | - `parent_balance/tests/test_datasets.py`, `parent_balance/tests/test_ipw.py`, |
80 | | - `parent_balance/tests/test_logging.py`, |
81 | | - `parent_balance/tests/test_poststratify.py`, |
82 | | - `parent_balance/tests/test_rake.py`, `parent_balance/tests/test_sample.py`, |
83 | | - `parent_balance/tests/test_stats_and_plots.py`, |
84 | | - `parent_balance/tests/test_testutil.py`, |
85 | | - `parent_balance/tests/test_util.py`) |
86 | | - - **Modernized type hints to PEP 604 syntax**: Updated all type annotations |
87 | | - across 11 files to use the newer PEP 604 union syntax (`X | Y` instead of |
88 | | - `Union[X, Y]` and `X | None` instead of `Optional[X]`), improving code |
89 | | - readability and aligning with Python 3.10+ typing conventions. Updated |
90 | | - `from __future__ import` statements to use `annotations` instead of the |
91 | | - older `absolute_import, division, print_function, unicode_literals`. Removed |
92 | | - unnecessary `Union` and `Optional` imports from `typing`. Files updated: |
93 | | - `__init__.py`, `adjustment.py`, `balancedf_class.py`, `cli.py`, |
94 | | - `datasets/__init__.py`, `sample_class.py`, |
95 | | - `stats_and_plots/weighted_comparisons_stats.py`, |
96 | | - `stats_and_plots/weighted_stats.py`, `stats_and_plots/weights_stats.py`, |
97 | | - `util.py`, `weighting_methods/ipw.py`. |
98 | | - - **Important compatibility note**: Type alias definitions in `typing.py` |
99 | | - retain `Union` syntax for Python 3.9 compatibility, as the `|` operator for |
100 | | - type aliases only works at runtime in Python 3.10+. Added comprehensive |
101 | | - inline documentation explaining this limitation and the distinction between |
102 | | - type annotations (which support `|` with |
103 | | - `from __future__ import annotations`) and type alias assignments (which |
104 | | - require `Union` for runtime evaluation in Python 3.9). |
105 | | - - **Enhanced type safety for plotting functions**: Replaced loose dictionary |
106 | | - type hints with structured `TypedDict` definition (`DataFrameWithWeight`) |
107 | | - for better type checking in `weighted_comparisons_plots.py`. Added |
108 | | - `SampleName` type alias to precisely specify valid sample name literals. |
109 | | - Removed numerous `# pyre-ignore` comments by properly handling type casts |
110 | | - and narrowing types. Added validation for plotly `dist_type` parameter to |
111 | | - raise clear errors when unsupported types are used. |
112 | | - - Fixed missing `Any` import in `weighted_comparisons_plots.py` to resolve |
113 | | - pyre-fixme[10] error |
114 | | - - Added comprehensive type annotations for previously untyped parameters and |
115 | | - return values throughout the codebase |
116 | | - - Fixed type casts and narrowed types where appropriate |
117 | | - - Initialized optional variables to handle pyre-fixme[61] issues |
118 | | - - Updated method signatures to match parent class interfaces |
119 | | - - **Replaced assert-based type narrowing with `_verify_value_type()` helper**: |
120 | | - Refactored code to use the `_verify_value_type()` utility function instead |
121 | | - of bare `assert x is not None` statements for type narrowing. This improves |
122 | | - code clarity, provides better error messages, and follows best practices for |
123 | | - pyre-strict mode. Enhanced `_verify_value_type()` in `testutil.py` with |
124 | | - optional type checking via `isinstance()` and improved overload signatures. |
125 | | - Changes applied to test files (`test_datasets.py`, `test_sample.py`, |
126 | | - `test_stats_and_plots.py`, `test_testutil.py`, `test_util.py`, |
127 | | - `test_weighted_comparisons_plots.py`) and production code (`ipw.py`). |
| 64 | + - Migrated 32 Python files from `# pyre-unsafe` to `# pyre-strict` mode, |
| 65 | + covering core modules, statistics, weighting methods, datasets, and test |
| 66 | + files |
| 67 | + - Modernized type hints to PEP 604 syntax (`X | Y` instead of `Union[X, Y]`) |
| 68 | + across 11 files for improved readability and Python 3.10+ alignment |
| 69 | + - Type alias definitions in `typing.py` retain `Union` syntax for Python 3.9 |
| 70 | + compatibility |
| 71 | + - Enhanced plotting function type safety with `TypedDict` definitions and |
| 72 | + proper type narrowing |
| 73 | + - Replaced assert-based type narrowing with `_verify_value_type()` helper for |
| 74 | + better error messages and pyre-strict compliance |
| 75 | + |
| 76 | +- **Renamed Balance**_DF to BalanceDF_\*\*\*\* |
| 77 | + - BalanceCovarsDF to BalanceDFCovars |
| 78 | + - BalanceOutcomesDF to BalanceDFOutcomes |
| 79 | + - BalanceWeightsDF to BalanceDFWeights |
128 | 80 |
|
129 | 81 | ## Bug Fixes |
130 | 82 |
|
131 | 83 | - **Utility Functions** |
132 | | - - Improved `quantize` function: preserves column ordering and replaces |
133 | | - assertions with proper TypeError exceptions |
134 | | - ([#133](https://github.com/facebookresearch/balance/pull/133)). |
| 84 | + - Fixed `quantize()` to preserve column ordering and use proper TypeError |
| 85 | + exceptions ([#133](https://github.com/facebookresearch/balance/pull/133)) |
135 | 86 | - **Statistical Functions** |
136 | | - - **Fixed division by zero in `asmd_improvement()`**: Added safety check to |
137 | | - prevent RuntimeWarning when `asmd_mean_before` is zero or very close to zero |
138 | | - (< 1e-10). The function now returns `0.0` (representing 0% improvement) when |
139 | | - the sample was already perfectly matched to the target before adjustment, |
140 | | - which is the semantically correct result. This eliminates the "invalid value |
141 | | - encountered in scalar divide" warning that appeared in test runs. |
142 | | -- **Weighting Methods** |
143 | | - - `rake()` and `poststratify()` now honour `weight_trimming_mean_ratio` and |
144 | | - `weight_trimming_percentile`, trimming and renormalising weights through the |
145 | | - enhanced `trim_weights(..., target_sum_weights=...)` API so the documented |
146 | | - parameters work as expected |
147 | | - ([#147](https://github.com/facebookresearch/balance/pull/147)). |
| 87 | + - Fixed division by zero in `asmd_improvement()` when `asmd_mean_before` is |
| 88 | + zero, now returns `0.0` for 0% improvement |
148 | 89 | - **CLI & Infrastructure** |
149 | | - - Replaced deprecated argparse FileType with pathlib.Path, eliminating |
150 | | - PendingDeprecationWarning |
151 | | - ([#134](https://github.com/facebookresearch/balance/pull/134)). |
| 90 | + - Replaced deprecated argparse FileType with pathlib.Path |
| 91 | + ([#134](https://github.com/facebookresearch/balance/pull/134)) |
152 | 92 | - **Weight Trimming** |
153 | | - - Ensured both `weight_trimming_mean_ratio` and `weight_trimming_percentile` |
154 | | - paths in `trim_weights()` return `pd.Series` with `dtype=np.float64` and |
155 | | - preserve the original index. |
156 | | - - **Fixed edge case in percentile-based winsorization**: `_validate_limit()` |
157 | | - now automatically adjusts percentile limits upward by |
158 | | - `min(2/n_weights, limit/10)` (capped at 1.0) before passing them to |
159 | | - `scipy.stats.mstats.winsorize`. This prevents edge cases where discrete data |
160 | | - distributions or floating-point precision issues could prevent winsorization |
161 | | - at exact boundary percentiles, ensuring at least one value gets winsorized |
162 | | - when a non-zero limit is specified |
163 | | - ([#144](https://github.com/facebookresearch/balance/issues/144)). |
164 | | - - **Improved documentation**: Enhanced docstrings for `trim_weights()` and |
165 | | - `_validate_limit()` to clearly explain the automatic limit adjustment |
166 | | - mechanism, provide concrete examples of percentile behavior (e.g., how |
167 | | - single values vs. tuples work), and document the relationship between mean |
168 | | - ratio trimming and percentile-based winsorization. |
| 93 | + - Fixed `trim_weights()` to consistently return `pd.Series` with |
| 94 | + `dtype=np.float64` and preserve original index across both trimming methods |
| 95 | + - Fixed percentile-based winsorization edge case: `_validate_limit()` now |
| 96 | + automatically adjusts limits to prevent floating-point precision issues |
| 97 | + ([#144](https://github.com/facebookresearch/balance/issues/144)) |
| 98 | + - Enhanced documentation for `trim_weights()` and `_validate_limit()` with |
| 99 | + clearer examples and explanations |
169 | 100 |
|
170 | 101 | ## Tests |
171 | 102 |
|
172 | | -- Enhanced test coverage for weight trimming: |
173 | | - - Added `test_trim_weights_return_type_consistency` to validate that both |
174 | | - trimming methods return `pd.Series` with `dtype=np.float64` and preserve |
175 | | - indices. |
176 | | - - Added 11 comprehensive tests for `_validate_limit()` covering normal |
177 | | - operation, edge cases, error conditions, type handling, and boundary |
178 | | - conditions. |
| 103 | +- Enhanced test coverage for weight trimming with |
| 104 | + `test_trim_weights_return_type_consistency` and 11 comprehensive tests for |
| 105 | + `_validate_limit()` covering edge cases, error conditions, and boundary |
| 106 | + conditions |
179 | 107 |
|
180 | 108 | ## Contributors |
181 | 109 |
|
|
0 commit comments