Route unmentioned columns to ignore_columns by talgalili · Pull Request #333 · facebookresearch/balance

talgalili · 2026-02-09T21:24:17Z

Summary:
Previously, when --outcome_columns was not explicitly set, all columns that were not the id, weight, or a covariate were automatically classified as outcome_columns. This was surprising — e.g. columns listed in --keep_columns would silently become outcome columns.

Now, unmentioned columns are placed into ignore_columns instead. Columns that are explicitly mentioned (id, weight, covariates, and any columns specified via --outcome_columns, --keep_columns, --batch_columns, or --keep_row_column) are not ignored. Pipelines that relied on the automatic outcome classification must now pass those column names explicitly via --outcome_columns.

Also improved docstrings for has_keep_columns(), keep_columns(), and the --keep_columns argument to clarify their behavior.

Differential Revision: D92749122

meta-codesync · 2026-02-09T21:24:25Z

@talgalili has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92749122.

Copilot

Pull request overview

This PR changes the balance CLI’s default column classification so that when --outcome_columns is not provided, previously-unclassified columns are routed into ignore_columns (instead of being auto-treated as outcomes), avoiding surprising behavior (e.g., --keep_columns silently becoming outcomes). It also updates related CLI docs and records the breaking change in the changelog.

Changes:

Update BalanceCLI.process_batch() to stop inferring outcome columns by default and instead populate ignore_columns with unmentioned columns.
Update/clarify keep_columns docstrings and CLI --keep_columns help text.
Adjust CLI tests to assert the new ignore_columns behavior; add changelog entry for the breaking change.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
`balance/cli.py`	Changes default column routing in `process_batch()` and updates `keep_columns` docs / CLI arg help.
`tests/test_cli.py`	Updates tests to validate that unmentioned columns now go to `ignore_columns` and outcomes default to `None`.
`balance/__init__.py`	Updates the package `__version__` string.
`CHANGELOG.md`	Adds a `0.17.0 (TBD)` breaking-change entry describing the new CLI behavior.

balance/cli.py

balance/__init__.py

Copilot · 2026-02-09T21:30:13Z

balance/cli.py

        "--keep_columns",
        type=str,
        required=False,
-        help="Set of columns we include in the output csv file",
+        help=(
+            "Comma-separated columns to include in the output csv file. "
+            "The output will be subsetted to only these columns."
+        ),


The CLI help text for --outcome_columns earlier in this file still describes the old default inference (“if not supplied… treated as outcomes”), but process_batch() no longer infers outcomes by default. Please update that help text to match the new behavior so users aren’t misled.

Copilot · 2026-02-09T21:30:13Z

tests/test_cli.py

-                if outcome_columns is None:
-                    raise AssertionError("Expected outcome_columns to be provided.")
                cls.calls.append(outcome_columns)
                cls.ignore_calls.append(ignore_columns)


RecordingSample.calls is annotated as list[tuple[str, ...]], but from_frame() now appends outcome_columns which can be None. Under # pyre-strict this will be a type error; update the annotation (and/or the from_frame signature) so the list element type includes None.

Copilot · 2026-02-09T21:30:14Z

balance/cli.py


+        Keep columns control which columns appear in the final output CSV.
+        After adjustment, the output DataFrame is subsetted to contain
+        **only** these columns (see :meth:`filter_output`).


The docstring references :meth:filter_output``, but BalanceCLI doesn’t define a `filter_output` method (the behavior appears to live in `adapt_output`). Update the reference so the docs don’t point to a nonexistent method.

Suggested change

**only** these columns (see :meth:`filter_output`).

**only** these columns (see :meth:`adapt_output`).

Summary: Previously, when --outcome_columns was not explicitly set, all columns that were not the id, weight, or a covariate were automatically classified as outcome_columns. This was surprising — e.g. columns listed in --keep_columns would silently become outcome columns. Now, unmentioned columns are placed into ignore_columns instead. Columns that are explicitly mentioned (id, weight, covariates, and any columns specified via --outcome_columns, --keep_columns, --batch_columns, or --keep_row_column) are not ignored. Pipelines that relied on the automatic outcome classification must now pass those column names explicitly via --outcome_columns. Also improved docstrings for has_keep_columns(), keep_columns(), and the --keep_columns argument to clarify their behavior. Differential Revision: D92749122

Summary: Pull Request resolved: facebookresearch#333 Previously, when --outcome_columns was not explicitly set, all columns that were not the id, weight, or a covariate were automatically classified as outcome_columns. This was surprising — e.g. columns listed in --keep_columns would silently become outcome columns. Now, unmentioned columns are placed into ignore_columns instead. Columns that are explicitly mentioned (id, weight, covariates, and any columns specified via --outcome_columns, --keep_columns, --batch_columns, or --keep_row_column) are not ignored. Pipelines that relied on the automatic outcome classification must now pass those column names explicitly via --outcome_columns. Also improved docstrings for has_keep_columns(), keep_columns(), and the --keep_columns argument to clarify their behavior. Differential Revision: D92749122

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

CHANGELOG.md

Copilot · 2026-02-09T21:52:43Z

tests/test_cli.py

+        with (
+            tempfile.TemporaryDirectory() as temp_dir,
+            tempfile.NamedTemporaryFile("w", suffix=".csv", delete=False) as in_file,
+        ):


This test creates a NamedTemporaryFile(..., delete=False) but never removes it after the test finishes. Please either set delete=True (closing the handle before it’s read) or explicitly os.unlink(in_file.name) in a finally block to avoid leaving temp files behind.

Summary: Pull Request resolved: facebookresearch#333 Previously, when --outcome_columns was not explicitly set, all columns that were not the id, weight, or a covariate were automatically classified as outcome_columns. This was surprising — e.g. columns listed in --keep_columns would silently become outcome columns. Now, unmentioned columns are placed into ignore_columns instead. Columns that are explicitly mentioned (id, weight, covariates, and any columns specified via --outcome_columns, --keep_columns, --batch_columns, or --keep_row_column) are not ignored. Pipelines that relied on the automatic outcome classification must now pass those column names explicitly via --outcome_columns. Also improved docstrings for has_keep_columns(), keep_columns(), and the --keep_columns argument to clarify their behavior. Differential Revision: D92749122

meta-codesync · 2026-02-09T22:34:22Z

This pull request has been merged in a83f409.

Copilot AI review requested due to automatic review settings February 9, 2026 21:24

meta-cla bot added the cla signed label Feb 9, 2026

meta-codesync bot added fb-exported meta-exported labels Feb 9, 2026

Copilot started reviewing on behalf of talgalili February 9, 2026 21:25 View session

Copilot AI reviewed Feb 9, 2026

View reviewed changes

talgalili force-pushed the export-D92749122 branch from 48114d5 to ad5faae Compare February 9, 2026 21:46

talgalili requested a review from Copilot February 9, 2026 21:47

Copilot started reviewing on behalf of talgalili February 9, 2026 21:47 View session

talgalili force-pushed the export-D92749122 branch from ad5faae to 18148fc Compare February 9, 2026 21:48

talgalili force-pushed the export-D92749122 branch from 18148fc to 412bcdb Compare February 9, 2026 21:49

talgalili force-pushed the export-D92749122 branch from 412bcdb to b2a39f8 Compare February 9, 2026 21:52

Copilot AI reviewed Feb 9, 2026

View reviewed changes

talgalili force-pushed the export-D92749122 branch from b2a39f8 to 23ad03b Compare February 9, 2026 21:56

meta-codesync bot closed this in a83f409 Feb 9, 2026

facebook-github-bot added the Merged label Feb 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Route unmentioned columns to ignore_columns#333

Route unmentioned columns to ignore_columns#333
talgalili wants to merge 1 commit intofacebookresearch:mainfrom
talgalili:export-D92749122

talgalili commented Feb 9, 2026

Uh oh!

meta-codesync bot commented Feb 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

meta-codesync bot commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	only these columns (see :meth:`filter_output`).
	only these columns (see :meth:`adapt_output`).

Conversation

talgalili commented Feb 9, 2026

Uh oh!

meta-codesync bot commented Feb 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

meta-codesync bot commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants