Skip to content

Fix weight diagnostics for DataFrame inputs#335

Closed
neuralsorcerer wants to merge 2 commits intofacebookresearch:mainfrom
neuralsorcerer:data
Closed

Fix weight diagnostics for DataFrame inputs#335
neuralsorcerer wants to merge 2 commits intofacebookresearch:mainfrom
neuralsorcerer:data

Conversation

@neuralsorcerer
Copy link
Collaborator

  • Implemented a fix in weight diagnostics by introducing _weights_to_series(...) and routing validation/computation through it so DataFrame weight inputs are consistently normalized to their first column. This was applied to design_effect, nonparametric_skew, prop_above_and_below, and weighted_median_breakdown_point.

Copilot AI review requested due to automatic review settings February 13, 2026 08:03
@meta-cla meta-cla bot added the cla signed label Feb 13, 2026
@neuralsorcerer neuralsorcerer added this to the balance 0.17.0 milestone Feb 13, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes inconsistent handling of weight diagnostics when weights are provided as a pd.DataFrame, by normalizing supported weight inputs to a single pd.Series (using the first DataFrame column) before validation and computation.

Changes:

  • Added _weights_to_series(...) helper to normalize list/ndarray/Series/DataFrame weight inputs (with explicit error for empty DataFrames).
  • Updated design_effect, nonparametric_skew, prop_above_and_below, and weighted_median_breakdown_point to route computation through _weights_to_series(...).
  • Expanded test coverage to ensure consistent behavior across list/ndarray/Series/DataFrame inputs and improved failure modes for empty/invalid DataFrames.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
balance/stats_and_plots/weights_stats.py Adds _weights_to_series normalization and applies it across weight diagnostic functions.
tests/test_stats_and_plots.py Adds tests for DataFrame-first-column behavior, empty DataFrame errors, and list/ndarray parity.
CHANGELOG.md Documents the bug fix under the upcoming release notes.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Copy link
Contributor

@talgalili talgalili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@meta-codesync
Copy link

meta-codesync bot commented Feb 13, 2026

@talgalili has imported this pull request. If you are a Meta employee, you can view this in D93216988.

@meta-codesync
Copy link

meta-codesync bot commented Feb 13, 2026

@talgalili merged this pull request in ca5a837.

@neuralsorcerer neuralsorcerer deleted the data branch February 14, 2026 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants