Fix schema mismatch error in decompose when input has extra columns by google-labs-jules[bot] · Pull Request #13 · ShawnStrasser/traffic-anomaly

google-labs-jules · 2026-01-06T19:17:49Z

This PR fixes an issue where traffic_anomaly.decompose would raise a ValueError: schema names don't match input data columns when the input DataFrame contained extra columns not used by the function.

The root cause was identified as an interaction between ibis-framework (v11) and duckdb where dropped intermediate columns were still present in the executed SQL result (via SELECT *), causing a mismatch with the expected schema.

The fix forces Ibis to generate an explicit column selection by applying a no-op calculation (+ 0.0) to the prediction column in the final projection. This ensures the backend returns exactly the columns defined in the schema.

A regression test tests/test_issue_12.py has been added.

PR created automatically by Jules for task 6302739600849262556 started by @ShawnStrasser

Note

Addresses a schema mismatch when executing decompose with input tables that include extra columns.

Forces an explicit projection before .execute() by applying a no-op to prediction to prevent Ibis/DuckDB from collapsing to SELECT *, which could re-include dropped intermediates
Keeps behavior unchanged for return_sql and Ibis-expression inputs; only affects non-Ibis execution path
Adds regression test tests/test_issue_12.py covering DataFrames with extra/keyword-like columns

^{Written by Cursor Bugbot for commit 71a9ec5. This will update automatically on new commits. Configure here.}

- Explicitly force column projection in `decompose.py` to prevent Ibis/DuckDB from returning dropped intermediate columns. - Workaround involves applying a no-op to the `prediction` column to defeat `SELECT *` optimization. - Add regression test `tests/test_issue_12.py`. Fixes #12

google-labs-jules · 2026-01-06T19:17:51Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

codecov-commenter · 2026-01-06T19:18:46Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/traffic_anomaly/decompose.py	0.00%	6 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

This PR fixes a schema mismatch error in traffic_anomaly.decompose that occurred when input DataFrames contained extra columns not used by the function. The issue stemmed from an Ibis/DuckDB optimization that could convert explicit column selections to SELECT *, inadvertently including dropped intermediate columns.

Key changes:

Added a workaround that forces explicit column projection by applying a no-op calculation (+ 0.0) to the 'prediction' column before execution
Added regression test tests/test_issue_12.py to verify the fix handles extra columns gracefully

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File	Description
src/traffic_anomaly/decompose.py	Implements workaround to force explicit column selection instead of SELECT * by applying no-op transformation to prediction column
tests/test_issue_12.py	Adds regression test verifying decompose handles DataFrames with extra columns (including SQL keywords) without raising schema mismatch errors

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-06T19:28:14Z

src/traffic_anomaly/decompose.py

+        # We force an explicit projection by applying a no-op to the 'prediction' column.
+
+        cols = []
+        for c in result.columns:
+            if c == 'prediction':
+                # Add 0.0 and cast back to original type to force expression generation
+                # This prevents Ibis from optimizing to SELECT *
+                cols.append((result[c] + 0.0).cast(result[c].type()).name(c))
+            else:
+                cols.append(result[c])


The workaround hardcodes the column name 'prediction' on line 175. If the prediction column is ever renamed in the future, this workaround will silently fail to prevent the SELECT * optimization, and the original bug could resurface. Consider making this more maintainable by either documenting this dependency clearly or finding a column-agnostic solution (e.g., applying the no-op transformation to all columns, or to the first/last column).

Suggested change

# We force an explicit projection by applying a no-op to the 'prediction' column.

cols = []

for c in result.columns:

if c == 'prediction':

# Add 0.0 and cast back to original type to force expression generation

# This prevents Ibis from optimizing to SELECT *

cols.append((result[c] + 0.0).cast(result[c].type()).name(c))

else:

cols.append(result[c])

# We force an explicit projection by applying a no-op to each column.

cols = []

for c in result.columns:

# Add 0.0 and cast back to original type to force expression generation

# This prevents Ibis from optimizing to SELECT *

cols.append((result[c] + 0.0).cast(result[c].type()).name(c))

Copilot · 2026-01-06T19:28:14Z

tests/test_issue_12.py

+    # Verify extra columns are present (assuming default behavior preserves them or at least doesn't crash)
+    # The current implementation of decompose (based on reading code) seems to preserve columns in 'table'
+    # but filters rows.
+    # However, if drop_extras=True (default), it drops specific calculation columns.
+    # It does NOT explicitly drop unknown columns.


The comment mentions "if drop_extras=True (default), it drops specific calculation columns" but the actual behavior is being verified by asserting that 'group' and 'safe_col' are present. Since drop_extras defaults to True and only drops 'median', 'season_day', 'season_week', this comment is technically correct but could be clearer. Consider simplifying the comment to just state what's being verified: that extra input columns are preserved in the output.

Suggested change

# Verify extra columns are present (assuming default behavior preserves them or at least doesn't crash)

# The current implementation of decompose (based on reading code) seems to preserve columns in 'table'

# but filters rows.

# However, if drop_extras=True (default), it drops specific calculation columns.

# It does NOT explicitly drop unknown columns.

# Verify that extra input columns are preserved in the output

Copilot · 2026-01-06T19:28:15Z

tests/test_issue_12.py

+            datetime_column='Date Time',
+            value_column='Speed(miles hour)',
+            entity_grouping_columns=['XDSegID'],
+            rolling_window_enable=False # Disable rolling window to keep it simple, issue happens regardless


Missing space after '#' in the inline comment.

Suggested change

rolling_window_enable=False # Disable rolling window to keep it simple, issue happens regardless

rolling_window_enable=False # Disable rolling window to keep it simple, issue happens regardless

Copilot · 2026-01-06T19:28:15Z

src/traffic_anomaly/decompose.py

+        cols = []
+        for c in result.columns:
+            if c == 'prediction':
+                # Add 0.0 and cast back to original type to force expression generation
+                # This prevents Ibis from optimizing to SELECT *
+                cols.append((result[c] + 0.0).cast(result[c].type()).name(c))
+            else:
+                cols.append(result[c])
+
+        return result.select(cols).execute()


The workaround applies a transformation (adding 0.0 and casting) to the 'prediction' column for every row in the result set. While this successfully prevents the SELECT * optimization, it may have a performance impact on large datasets. Consider documenting this trade-off in a comment, or exploring if there's a more efficient way to force explicit column selection (such as using Ibis API features to control SQL generation if available).

Copilot · 2026-01-06T19:28:15Z

src/traffic_anomaly/decompose.py

+                # Add 0.0 and cast back to original type to force expression generation
+                # This prevents Ibis from optimizing to SELECT *
+                cols.append((result[c] + 0.0).cast(result[c].type()).name(c))


The no-op transformation (result[c] + 0.0).cast(result[c].type()) assumes that the 'prediction' column is a numeric type that supports addition. While this is true for the current implementation (prediction is the result of ibis.greatest which returns a float), this assumption is not validated. If the implementation changes and prediction becomes a different type, this could fail silently or raise a runtime error. Consider adding a comment explaining this assumption or adding a type check.

Copilot · 2026-01-06T19:28:15Z

tests/test_issue_12.py

+    data = {
+        'XDSegID': [1, 1, 1],
+        'Miles': [0.5, 0.5, 0.5],
+        'Date Time': [
+            datetime(2023, 1, 1, 10, 0),
+            datetime(2023, 1, 1, 10, 15),
+            datetime(2023, 1, 1, 10, 30)
+        ],
+        'Speed(miles hour)': [60, 65, 55],
+        'travel_time_sec': [30, 27, 32],
+        'group': ['A', 'A', 'A'],          # Extra column 1 (SQL keyword)
+        'safe_col': ['B', 'B', 'B']        # Extra column 2
+    }


The test data contains only 3 rows, but the default value for min_time_of_day_samples is 7. This means the decompose function will filter out all rows where time_of_day_count < 7, likely resulting in an empty DataFrame. The test should either provide more data (at least 7 rows with the same time_of_day) or explicitly set min_time_of_day_samples to a lower value (e.g., 1 or 3) to ensure the test produces meaningful results.

Copilot · 2026-01-06T19:28:15Z

tests/test_issue_12.py

+    try:
+        result = traffic_anomaly.decompose(
+            data=df,
+            datetime_column='Date Time',
+            value_column='Speed(miles hour)',
+            entity_grouping_columns=['XDSegID'],
+            rolling_window_enable=False # Disable rolling window to keep it simple, issue happens regardless
+        )
+    except ValueError as e:
+        pytest.fail(f"decompose raised ValueError with extra columns: {e}")
+    except Exception as e:
+        pytest.fail(f"decompose raised unexpected exception: {e}")


The try-except pattern with pytest.fail is less idiomatic than simply allowing the exception to be raised naturally. In pytest, if an unexpected exception occurs, the test will fail automatically with a clear traceback. The current approach obscures the actual error details. Consider removing the try-except block and letting the test fail naturally if decompose raises an error, which would provide better debugging information.

Suggested change

try:

result = traffic_anomaly.decompose(

data=df,

datetime_column='Date Time',

value_column='Speed(miles hour)',

entity_grouping_columns=['XDSegID'],

rolling_window_enable=False # Disable rolling window to keep it simple, issue happens regardless

)

except ValueError as e:

pytest.fail(f"decompose raised ValueError with extra columns: {e}")

except Exception as e:

pytest.fail(f"decompose raised unexpected exception: {e}")

result = traffic_anomaly.decompose(

data=df,

datetime_column='Date Time',

value_column='Speed(miles hour)',

entity_grouping_columns=['XDSegID'],

rolling_window_enable=False # Disable rolling window to keep it simple, issue happens regardless

)

…xtra columns

ShawnStrasser marked this pull request as ready for review January 6, 2026 19:24

ShawnStrasser requested a review from Copilot January 6, 2026 19:24

Copilot started reviewing on behalf of ShawnStrasser January 6, 2026 19:25 View session

Copilot AI reviewed Jan 6, 2026

View reviewed changes

ShawnStrasser added a commit that referenced this pull request Jan 6, 2026

Merge PR #13: Fix schema mismatch error in decompose when input has e…

e6b48ca

…xtra columns

ShawnStrasser merged commit 71a9ec5 into main Jan 6, 2026
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix schema mismatch error in decompose when input has extra columns#13

Fix schema mismatch error in decompose when input has extra columns#13
ShawnStrasser merged 1 commit intomainfrom
fix-issue-12-schema-mismatch-6302739600849262556

google-labs-jules bot commented Jan 6, 2026 •

edited by cursor bot

Loading

Uh oh!

google-labs-jules bot commented Jan 6, 2026

Uh oh!

codecov-commenter commented Jan 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Copilot AI Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	rolling_window_enable=False # Disable rolling window to keep it simple, issue happens regardless
	rolling_window_enable=False # Disable rolling window to keep it simple, issue happens regardless

Conversation

google-labs-jules bot commented Jan 6, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

google-labs-jules bot commented Jan 6, 2026

Uh oh!

codecov-commenter commented Jan 6, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

google-labs-jules bot commented Jan 6, 2026 •

edited by cursor bot

Loading